[GitHub] spark pull request: [SPARK-1061] assumePartitioned

2015-09-04 Thread rapen
Github user rapen commented on the pull request:

https://github.com/apache/spark/pull/4449#issuecomment-137697278
  
@danielhaviv , see tests in PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9669][MESOS] Support PySpark on Mesos c...

2015-09-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8349#issuecomment-137709964
  
  [Test build #41995 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41995/console)
 for   PR 8349 at commit 
[`231c810`](https://github.com/apache/spark/commit/231c810b51b31b928e402ceebad72ca19b4314e0).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10441][SQL] Save data correctly to json...

2015-09-04 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/8597#discussion_r38742310
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala 
---
@@ -112,11 +114,14 @@ class SimpleTextRelation(
 val fields = dataSchema.map(_.dataType)
 
 sparkContext.textFile(inputStatuses.map(_.getPath).mkString(",")).map 
{ record =>
-  Row(record.split(",").zip(fields).map { case (value, dataType) =>
+  Row(record.split(",", -1).zip(fields).map { case (v, dataType) =>
+val value = if (v == "") null else v
 // `Cast`ed values are always of Catalyst types (i.e. UTF8String 
instead of String, etc.)
 val catalystValue = Cast(Literal(value), dataType).eval()
 // Here we're converting Catalyst values to Scala values to test 
`needsConversion`
-CatalystTypeConverters.convertToScala(catalystValue, dataType)
+val scalaV = CatalystTypeConverters.convertToScala(catalystValue, 
dataType)
+
+scalaV
--- End diff --

Nit: Remove `scalaV`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10441][SQL] Save data correctly to json...

2015-09-04 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/8597#discussion_r38742485
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala
 ---
@@ -100,6 +104,87 @@ abstract class HadoopFsRelationTest extends QueryTest 
with SQLTestUtils {
 }
   }
 
+  test("test all data types") {
+withTempDir { file =>
+  file.delete()
+
+  // Create the schema.
+  val struct =
+StructType(
+  StructField("f1", FloatType, true) ::
+StructField("f2", ArrayType(BooleanType), true) :: Nil)
+  val dataTypes =
+Seq(
+  StringType, BinaryType, NullType, BooleanType,
+  ByteType, ShortType, IntegerType, LongType,
+  FloatType, DoubleType, DecimalType(25, 5), DecimalType(6, 5),
+  DateType, TimestampType,
+  ArrayType(IntegerType), MapType(StringType, LongType), struct,
+  new MyDenseVectorUDT())
--- End diff --

`CalendarIntervalType` is not covered here. Is it intentional?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-10445: Extend Maven version (enforcer)

2015-09-04 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/8598#issuecomment-137684318
  
No, we require 3.3.3 explicitly to work around some problems with Maven 
3.2. `build/mvn` downloads it for you though. Do you mind closing this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10446][SQL] Support to specify join typ...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8600#issuecomment-137695938
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10446][SQL] Support to specify join typ...

2015-09-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8600#issuecomment-137695988
  
  [Test build #41997 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41997/consoleFull)
 for   PR 8600 at commit 
[`8ff97ed`](https://github.com/apache/spark/commit/8ff97ede7250e032c88cf20a4e95f3e1e1cd416f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10446][SQL] Support to specify join typ...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8600#issuecomment-137695913
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10441][SQL] Save data correctly to json...

2015-09-04 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/8597#discussion_r38742967
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala
 ---
@@ -100,6 +104,87 @@ abstract class HadoopFsRelationTest extends QueryTest 
with SQLTestUtils {
 }
   }
 
+  test("test all data types") {
+withTempDir { file =>
+  file.delete()
+
+  // Create the schema.
+  val struct =
+StructType(
+  StructField("f1", FloatType, true) ::
+StructField("f2", ArrayType(BooleanType), true) :: Nil)
+  val dataTypes =
+Seq(
+  StringType, BinaryType, NullType, BooleanType,
+  ByteType, ShortType, IntegerType, LongType,
+  FloatType, DoubleType, DecimalType(25, 5), DecimalType(6, 5),
+  DateType, TimestampType,
+  ArrayType(IntegerType), MapType(StringType, LongType), struct,
+  new MyDenseVectorUDT())
+  val fields = dataTypes.zipWithIndex.map { case (dataType, index) =>
+StructField(s"col$index", dataType, true)
+  }
+  val schema = StructType(fields)
+
+  // Create a RDD for the schema
+  val rdd =
+sqlContext.sparkContext.parallelize((1 to 100), 10).flatMap { i =>
+  val row1 = Row(
+s"str${i}: test save.",
+s"binary${i}: test save.".getBytes("UTF-8"),
+null,
+i % 2 == 0,
+i.toByte,
+i.toShort,
+i,
+Long.MaxValue - i.toLong,
+(i + 0.25).toFloat,
+(i + 0.75),
+BigDecimal(Long.MaxValue.toString + ".12345"),
+new java.math.BigDecimal(s"${i % 9 + 1}" + ".23456"),
+new Date(i),
+new Timestamp(i),
+(1 to i).toSeq,
+(0 to i).map(j => s"map_key_$j" -> (Long.MaxValue - j)).toMap,
+Row((i - 0.25).toFloat, Seq(true, false, null)),
+new MyDenseVector(Array(1.1, 2.1, 3.1)))
+  val row2 = Row.fromSeq(Seq.fill(dataTypes.length)(null))
+  row1 :: row2 :: Nil
+}
+  val df = sqlContext.createDataFrame(rdd, schema)
+
+  // All columns that have supported data types of this source.
+  val supportedColumns = schema.fields.filter { field =>
+supportsDataType(field.dataType)
+  }.map { field =>
+field.name
+  }
--- End diff --

Nit: Can be simplified a little bit:

```scala
val supportedColumns = schema.collect {
  case StructField(name, dataType) if supportsDataType(dataType) => name
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] SPARK-6981: Factor out SparkPlanner and ...

2015-09-04 Thread evacchi
Github user evacchi commented on the pull request:

https://github.com/apache/spark/pull/6356#issuecomment-137717489
  
@marmbrus I have brought this up to date. Might need fixes in order to 
merge cleanly, though


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10199][SPARK-10200][SPARK-10201][SPARK-...

2015-09-04 Thread vinodkc
Github user vinodkc commented on the pull request:

https://github.com/apache/spark/pull/8507#issuecomment-137686626
  
@feynmanliang , I've removed case classes used for schema inference.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10437][SQ] Support aggregation expressi...

2015-09-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8599#issuecomment-137708683
  
  [Test build #41996 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41996/console)
 for   PR 8599 at commit 
[`452cfb5`](https://github.com/apache/spark/commit/452cfb5259e2942364aeede944cccaeda7d19a24).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10437][SQ] Support aggregation expressi...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8599#issuecomment-137708777
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10437][SQ] Support aggregation expressi...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8599#issuecomment-137708779
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41996/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10227] fatal warnings with sbt on Scala...

2015-09-04 Thread skyluc
Github user skyluc commented on a diff in the pull request:

https://github.com/apache/spark/pull/8433#discussion_r38743996
  
--- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
---
@@ -24,6 +24,7 @@ import java.util.{Collections, ArrayList => JArrayList, 
List => JList, Map => JM
 import scala.collection.JavaConverters._
 import scala.collection.mutable
 import scala.language.existentials
+import scala.annotation.meta._
--- End diff --

No, it is a left over of the previous changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10310][SQL]Using \t as the field delime...

2015-09-04 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/8476#discussion_r38745091
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
 ---
@@ -429,7 +429,8 @@ class HiveQuerySuite extends HiveComparisonTest with 
BeforeAndAfter {
   |'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' FROM src;
 """.stripMargin.replaceAll(System.lineSeparator(), " "))
 
-  test("transform with SerDe2") {
+  // TODO: Only support serde which compatible with TextRecordReader at 
the moment.
+  ignore("transform with SerDe2") {
--- End diff --

Why this test case should be ignored? The involved SQL query doesn't 
contain a `RECORDREADER` clause, and should fall back to `TextRecordReader`, 
shouldn't it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10227] fatal warnings with sbt on Scala...

2015-09-04 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/8433#issuecomment-137707108
  
Aside from possible removing those imports, yes, this looks good to me for 
master/1.6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10441][SQL] Save data correctly to json...

2015-09-04 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/8597#discussion_r38742692
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala
 ---
@@ -100,6 +104,87 @@ abstract class HadoopFsRelationTest extends QueryTest 
with SQLTestUtils {
 }
   }
 
+  test("test all data types") {
+withTempDir { file =>
+  file.delete()
+
+  // Create the schema.
+  val struct =
+StructType(
+  StructField("f1", FloatType, true) ::
+StructField("f2", ArrayType(BooleanType), true) :: Nil)
+  val dataTypes =
+Seq(
+  StringType, BinaryType, NullType, BooleanType,
+  ByteType, ShortType, IntegerType, LongType,
+  FloatType, DoubleType, DecimalType(25, 5), DecimalType(6, 5),
+  DateType, TimestampType,
+  ArrayType(IntegerType), MapType(StringType, LongType), struct,
+  new MyDenseVectorUDT())
+  val fields = dataTypes.zipWithIndex.map { case (dataType, index) =>
+StructField(s"col$index", dataType, true)
+  }
+  val schema = StructType(fields)
+
+  // Create a RDD for the schema
+  val rdd =
+sqlContext.sparkContext.parallelize((1 to 100), 10).flatMap { i =>
+  val row1 = Row(
+s"str${i}: test save.",
+s"binary${i}: test save.".getBytes("UTF-8"),
+null,
+i % 2 == 0,
+i.toByte,
+i.toShort,
+i,
+Long.MaxValue - i.toLong,
+(i + 0.25).toFloat,
+(i + 0.75),
--- End diff --

Nit: `075D` or add the `toDouble` call to make it explicit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10310][SQL]Using \t as the field delime...

2015-09-04 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/8476#discussion_r38747088
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
 ---
@@ -328,23 +361,27 @@ case class HiveScriptIOSchema (
 (columns, columnTypes)
   }
 
-  private def initSerDe(
+  private def createTableProperties(
   serdeClassName: String,
   columns: Seq[String],
   columnTypes: Seq[DataType],
-  serdeProps: Seq[(String, String)]): AbstractSerDe = {
-
-val serde = 
Utils.classForName(serdeClassName).newInstance.asInstanceOf[AbstractSerDe]
-
+  serdeProps: Seq[(String, String)]) = {
 val columnTypesNames = 
columnTypes.map(_.toTypeInfo.getTypeName()).mkString(",")
-
 var propsMap = serdeProps.toMap + (serdeConstants.LIST_COLUMNS -> 
columns.mkString(","))
 propsMap = propsMap + (serdeConstants.LIST_COLUMN_TYPES -> 
columnTypesNames)
-
+propsMap = propsMap + (serdeConstants.FIELD_DELIM -> "\t")
--- End diff --

Shouldn't we need to specify line delimiter here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10199][SPARK-10200][SPARK-10201][SPARK-...

2015-09-04 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/8507#discussion_r38733907
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala ---
@@ -125,16 +126,19 @@ object KMeansModel extends Loader[KMeansModel] {
 
 def save(sc: SparkContext, model: KMeansModel, path: String): Unit = {
   val sqlContext = new SQLContext(sc)
-  import sqlContext.implicits._
   val metadata = compact(render(
 ("class" -> thisClassName) ~ ("version" -> thisFormatVersion) ~ 
("k" -> model.k)))
   sc.parallelize(Seq(metadata), 
1).saveAsTextFile(Loader.metadataPath(path))
-  val dataRDD = sc.parallelize(model.clusterCenters.zipWithIndex).map 
{ case (point, id) =>
-Cluster(id, point)
--- End diff --

Removed case classes except NodeData,SplitData,PredictData, these classes 
simplify data extraction


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-10445: Extend Maven version (enforcer)

2015-09-04 Thread jbonofre
Github user jbonofre closed the pull request at:

https://github.com/apache/spark/pull/8598


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10310][SQL]Using \t as the field delime...

2015-09-04 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/8476#issuecomment-137723720
  
@zhichao-li Could you please add a test case that explicit checks for the 
output format of a transformation query?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10117][MLLIB] Implement SQL data source...

2015-09-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8537#issuecomment-137726639
  
  [Test build #41998 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41998/consoleFull)
 for   PR 8537 at commit 
[`8660d0e`](https://github.com/apache/spark/commit/8660d0e2a815b367cc9f34251926e315bc95f9c1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-10445: Extend Maven version (enforcer)

2015-09-04 Thread jbonofre
Github user jbonofre commented on the pull request:

https://github.com/apache/spark/pull/8598#issuecomment-137684872
  
Allright, weird, as it works with Maven 3.2.5 for me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10446][SQL] Support to specify join typ...

2015-09-04 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/8600

[SPARK-10446][SQL] Support to specify join type when calling join with 
usingColumns

JIRA: https://issues.apache.org/jira/browse/SPARK-10446

Currently the method `join(right: DataFrame, usingColumns: Seq[String])` 
only supports inner join. It is more convenient to have it support other join 
types.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 usingcolumns_df

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8600.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8600


commit 5ab4846852723d1c3505223e18c41dbf7bc40fa0
Author: Liang-Chi Hsieh 
Date:   2015-09-04T09:43:35Z

Support to specify join type when calling join with usingColumns.

commit 8ff97ede7250e032c88cf20a4e95f3e1e1cd416f
Author: Liang-Chi Hsieh 
Date:   2015-09-04T10:06:35Z

Add unit test.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10441][SQL] Save data correctly to json...

2015-09-04 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/8597#discussion_r38742782
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala
 ---
@@ -100,6 +104,87 @@ abstract class HadoopFsRelationTest extends QueryTest 
with SQLTestUtils {
 }
   }
 
+  test("test all data types") {
+withTempDir { file =>
+  file.delete()
+
+  // Create the schema.
+  val struct =
+StructType(
+  StructField("f1", FloatType, true) ::
+StructField("f2", ArrayType(BooleanType), true) :: Nil)
+  val dataTypes =
+Seq(
+  StringType, BinaryType, NullType, BooleanType,
+  ByteType, ShortType, IntegerType, LongType,
+  FloatType, DoubleType, DecimalType(25, 5), DecimalType(6, 5),
+  DateType, TimestampType,
+  ArrayType(IntegerType), MapType(StringType, LongType), struct,
+  new MyDenseVectorUDT())
+  val fields = dataTypes.zipWithIndex.map { case (dataType, index) =>
+StructField(s"col$index", dataType, true)
+  }
+  val schema = StructType(fields)
+
+  // Create a RDD for the schema
+  val rdd =
+sqlContext.sparkContext.parallelize((1 to 100), 10).flatMap { i =>
+  val row1 = Row(
+s"str${i}: test save.",
+s"binary${i}: test save.".getBytes("UTF-8"),
+null,
+i % 2 == 0,
+i.toByte,
+i.toShort,
+i,
+Long.MaxValue - i.toLong,
+(i + 0.25).toFloat,
+(i + 0.75),
+BigDecimal(Long.MaxValue.toString + ".12345"),
+new java.math.BigDecimal(s"${i % 9 + 1}" + ".23456"),
+new Date(i),
+new Timestamp(i),
+(1 to i).toSeq,
+(0 to i).map(j => s"map_key_$j" -> (Long.MaxValue - j)).toMap,
+Row((i - 0.25).toFloat, Seq(true, false, null)),
+new MyDenseVector(Array(1.1, 2.1, 3.1)))
+  val row2 = Row.fromSeq(Seq.fill(dataTypes.length)(null))
+  row1 :: row2 :: Nil
+}
--- End diff --

Seems that `RandomDataGenerator` helps here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10446][SQL] Support to specify join typ...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8600#issuecomment-137715359
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41997/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10310][SQL]Using \t as the field delime...

2015-09-04 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/8476#discussion_r38745795
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
 ---
@@ -290,7 +300,9 @@ case class HiveScriptIOSchema (
 outputSerdeClass: Option[String],
 inputSerdeProps: Seq[(String, String)],
 outputSerdeProps: Seq[(String, String)],
-schemaLess: Boolean) extends ScriptInputOutputSchema with 
HiveInspectors {
+schemaLess: Boolean,
+recordWriter: String,
+recordReader: String) extends ScriptInputOutputSchema with 
HiveInspectors {
--- End diff --

Use `Option[String]` instead of `String` for `recordWriter` and 
`recordReader`, and don't use empty string as their default value in other 
places of this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-10445: Extend Maven version (enforcer)

2015-09-04 Thread jbonofre
Github user jbonofre commented on the pull request:

https://github.com/apache/spark/pull/8598#issuecomment-137692450
  
It makes sense, thanks !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10441][SQL] Save data correctly to json...

2015-09-04 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/8597#discussion_r38742367
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala
 ---
@@ -100,6 +104,87 @@ abstract class HadoopFsRelationTest extends QueryTest 
with SQLTestUtils {
 }
   }
 
+  test("test all data types") {
+withTempDir { file =>
+  file.delete()
--- End diff --

You can use `withTempPath` here.  It provides a temporary path without 
creating the directory, so that you don't need the `delete()` call.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10446][SQL] Support to specify join typ...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8600#issuecomment-137715358
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10446][SQL] Support to specify join typ...

2015-09-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8600#issuecomment-137715324
  
  [Test build #41997 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41997/console)
 for   PR 8600 at commit 
[`8ff97ed`](https://github.com/apache/spark/commit/8ff97ede7250e032c88cf20a4e95f3e1e1cd416f).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class BlockFetchException(messages: String, throwable: Throwable)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10437][SQ] Support aggregation expressi...

2015-09-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8599#issuecomment-137684190
  
  [Test build #41996 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41996/consoleFull)
 for   PR 8599 at commit 
[`452cfb5`](https://github.com/apache/spark/commit/452cfb5259e2942364aeede944cccaeda7d19a24).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10227] fatal warnings with sbt on Scala...

2015-09-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8433#issuecomment-137698769
  
  [Test build #1719 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1719/console)
 for   PR 8433 at commit 
[`0408404`](https://github.com/apache/spark/commit/04084043276cba2b773b5895a0935278ccc611bd).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10441][SQL] Save data correctly to json...

2015-09-04 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/8597#issuecomment-137714184
  
Generally looks good except for a few minor issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-10445: Extend Maven version (enforcer)

2015-09-04 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/8598#issuecomment-137686197
  
Yeah it 99% works -- I recall that the problem was a little bit subtle, 
some problem with dependencies or artifacts, and maybe only affects a small 
number of use cases, but still worth avoiding.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9669][MESOS] Support PySpark on Mesos c...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8349#issuecomment-137710055
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41995/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9669][MESOS] Support PySpark on Mesos c...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8349#issuecomment-137710050
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10310][SQL]Using \t as the field delime...

2015-09-04 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/8476#discussion_r38745663
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
 ---
@@ -58,6 +61,9 @@ case class ScriptTransformation(
 
   override def otherCopyArgs: Seq[HiveContext] = sc :: Nil
 
+  private val _broadcastedHiveConf = this.
+sc.sparkContext.broadcast(new SerializableConfiguration(sc.hiveconf))
--- End diff --

You probably don't need broadcasting here. `SerializableConfiguration` 
already avoids reading XML files while deserializing `Configuration` instances. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10117][MLLIB] Implement SQL data source...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8537#issuecomment-137725481
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10117][MLLIB] Implement SQL data source...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8537#issuecomment-137725498
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9652][CORE] Added method for Avro file ...

2015-09-04 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/7971#discussion_r38748233
  
--- Diff: core/src/test/scala/org/apache/spark/SparkFunSuite.scala ---
@@ -44,5 +49,27 @@ private[spark] abstract class SparkFunSuite extends 
FunSuite with Logging {
   logInfo(s"\n\n= FINISHED $shortSuiteName: '$testName' =\n")
 }
   }
+  /**
+   * Generates a temporary path without creating the actual 
file/directory, then pass it to `f`. If
+   * a file/directory is created there by `f`, it will be delete after `f` 
returns.
+   *
+   * @todo Probably this method should be moved to a more general place
+   */
+  protected def withTempPath(f: File => Unit): Unit = {
+val path = Utils.createTempDir()
+path.delete()
+try f(path) finally Utils.deleteRecursively(path)
+  }
+
+  /**
+   * Creates a temporary directory, which is then passed to `f` and will 
be deleted after `f`
+   * returns.
+   *
+   * @todo Probably this method should be moved to a more general place
+   */
+  protected def withTempDir(f: File => Unit): Unit = {
+val dir = Utils.createTempDir().getCanonicalFile
+try f(dir) finally Utils.deleteRecursively(dir)
+  }
--- End diff --

Please remove the `withTempDir` method defined in 
`OrcPartitionDiscoverySuite`. It's causing compilation error since an 
`override` is missing there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10192] [core] simple test w/ failure in...

2015-09-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8402#issuecomment-137746771
  
  [Test build #41999 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41999/consoleFull)
 for   PR 8402 at commit 
[`cfcf4e6`](https://github.com/apache/spark/commit/cfcf4e667121b4225ce327f5f764b00677059865).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9652][CORE] Added method for Avro file ...

2015-09-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7971#issuecomment-137751981
  
  [Test build #42000 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42000/consoleFull)
 for   PR 7971 at commit 
[`bc8f2be`](https://github.com/apache/spark/commit/bc8f2beb80bd10f71eff1010e250cfccc99d9a8e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9652][CORE] Added method for Avro file ...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7971#issuecomment-137751364
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9851] Support submitting map stages ind...

2015-09-04 Thread squito
Github user squito commented on the pull request:

https://github.com/apache/spark/pull/8180#issuecomment-137745960
  
Sorry Andrew, I certainly didn't mean to imply that I knew your opinion on 
this particular patch -- as Sean said, I was just trying to point out the 
general situation.  My point is just that new features add complexity, and make 
it harder to fix bugs; and we have plenty of complexity and bugs now.  I don't 
mean to block the patch, its a cool feature.  I'm just nervous about any change 
to the dag scheduler (eg., even my own proposal in 
https://github.com/apache/spark/pull/8427, which is 30 lines + 100 lines of 
tests, and most likely needs more testing still).  Perhaps I err too much on 
the side of caution, I'm just providing a counterpoint.

Thanks for adding the the additional tests, Matei.  Btw, there is an 
example test for skipped stages you should be able to copy more or less here: 
https://github.com/apache/spark/pull/8402


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9652][CORE] Added method for Avro file ...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7971#issuecomment-137751393
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10446][SQL] Support to specify join typ...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8600#issuecomment-137752519
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10446][SQL] Support to specify join typ...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8600#issuecomment-137752570
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10446][SQL] Support to specify join typ...

2015-09-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8600#issuecomment-137754680
  
  [Test build #42001 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42001/consoleFull)
 for   PR 8600 at commit 
[`efe069a`](https://github.com/apache/spark/commit/efe069aabfb3b06f2a9884153bb035022265652f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9170][SQL] Use OrcStructInspector to be...

2015-09-04 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/7520#issuecomment-137728254
  
@viirya Oh sorry. It would be nice if you ping me after you updated your PR 
next time :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10117][MLLIB] Implement SQL data source...

2015-09-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8537#issuecomment-137737040
  
  [Test build #41998 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41998/console)
 for   PR 8537 at commit 
[`8660d0e`](https://github.com/apache/spark/commit/8660d0e2a815b367cc9f34251926e315bc95f9c1).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DefaultSource extends RelationProvider with DataSourceRegister `
  * `  implicit class LibSVMReader(read: DataFrameReader) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10117][MLLIB] Implement SQL data source...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8537#issuecomment-137737175
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10310][SQL]Using \t as the field delime...

2015-09-04 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/8476#issuecomment-137726968
  
I'm not super familiar with the script transformation feature. If I 
understand this problem correctly, in prior versions, we doesn't support 
`RECORDREADER` or `RECORDWRITER` clauses and thus always fallback to 
`TextRecordReader` and `TextRecordWriter`. However, we didn't specify line 
delimiter or field delimiter properly. Is it?

It seems that this PR not only tries to fix the delimiters issue, but also 
adds support for `RECORDREADER` and `RECORDWRITER` clauses, which I think could 
be moved into a separate PR to simplify this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10117][MLLIB] Implement SQL data source...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8537#issuecomment-137737181
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41998/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10192] [core] simple test w/ failure in...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8402#issuecomment-137745447
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10192] [core] simple test w/ failure in...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8402#issuecomment-137745469
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10437][SQ] Support aggregation expressi...

2015-09-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/8599#discussion_r38762488
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -561,7 +561,7 @@ class Analyzer(
 }
 
   case sort @ Sort(sortOrder, global, aggregate: Aggregate)
-if aggregate.resolved && !sort.resolved =>
+if aggregate.resolved =>
--- End diff --

We need to set up a stop condition for this rule, or something like `SELECT 
a, SUM(b) FROM t GROUP BY a ORDER BY a` will go through this rule again and 
again until reach the fixed point. How about changing the end of this rule to:
```
if (evaluatedOrderings == sortOrder) {
  sort
} else {
  Project(aggregate.output,
Sort(evaluatedOrderings, global,
  aggregate.copy(aggregateExpressions = originalAggExprs ++ 
needsPushDown)))
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10437][SQ] Support aggregation expressi...

2015-09-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/8599#discussion_r38762581
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -1519,6 +1519,19 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 |ORDER BY sum(b) + 1
   """.stripMargin),
   Row("4", 3) :: Row("1", 7) :: Row("3", 11) :: Row("2", 15) :: Nil)
+
+Seq("1" -> 3, "2" -> 7, "2" -> 8, "3" -> 5, "3" -> 6, "3" -> 2, "4" -> 
1, "4" -> 2,
+  "4" -> 3, "4" -> 4).toDF("a", "b").registerTempTable("orderByData2")
--- End diff --

Why add `orderByData2`? I thought `orderByData` can also reproduce this 
issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10437][SQ] Support aggregation expressi...

2015-09-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8599#issuecomment-137778681
  
  [Test build #42003 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42003/consoleFull)
 for   PR 8599 at commit 
[`e65f4db`](https://github.com/apache/spark/commit/e65f4dbf1fa00d786ba7ebf0f302861965d6d5cc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9851] Support submitting map stages ind...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8180#issuecomment-137779452
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9851] Support submitting map stages ind...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8180#issuecomment-137779431
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10441][SQL] Save data correctly to json...

2015-09-04 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8597#discussion_r38769233
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala
 ---
@@ -100,6 +104,87 @@ abstract class HadoopFsRelationTest extends QueryTest 
with SQLTestUtils {
 }
   }
 
+  test("test all data types") {
+withTempDir { file =>
+  file.delete()
+
+  // Create the schema.
+  val struct =
+StructType(
+  StructField("f1", FloatType, true) ::
+StructField("f2", ArrayType(BooleanType), true) :: Nil)
+  val dataTypes =
+Seq(
+  StringType, BinaryType, NullType, BooleanType,
+  ByteType, ShortType, IntegerType, LongType,
+  FloatType, DoubleType, DecimalType(25, 5), DecimalType(6, 5),
+  DateType, TimestampType,
+  ArrayType(IntegerType), MapType(StringType, LongType), struct,
+  new MyDenseVectorUDT())
--- End diff --

I do not think we can save it to any data source right now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9851] Support submitting map stages ind...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8180#issuecomment-137781689
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9926] [SPARK-10340] [SQL] Use S3 bulk l...

2015-09-04 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/8512#issuecomment-137765159
  
@ewan-realitymine, yeah, that is @davies point about `HadoopFsRelation`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10117][MLLIB] Implement SQL data source...

2015-09-04 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8537#discussion_r38765918
  
--- Diff: 
mllib/src/test/java/org/apache/spark/ml/source/JavaLibSVMRelationSuite.java ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.source;
+
+import com.google.common.base.Charsets;
+import com.google.common.io.Files;
+
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.util.Utils;
+
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.IOException;
+
+/**
+ * Test LibSVMRelation in Java.
+ */
+public class JavaLibSVMRelationSuite {
+  private transient JavaSparkContext jsc;
+  private transient SQLContext jsql;
+  private transient DataFrame dataset;
+
+  private File path;
+
+  @Before
+  public void setUp() throws IOException {
+jsc = new JavaSparkContext("local", "JavaLibSVMRelationSuite");
+jsql = new SQLContext(jsc);
+
+path = Utils.createTempDir(System.getProperty("java.io.tmpdir"), 
"datasource")
+  .getCanonicalFile();
+if (path.exists()) {
+  path.delete();
+}
+
+String s = "1 1:1.0 3:2.0 5:3.0\n0\n0 2:4.0 4:5.0 6:6.0";
+Files.write(s, path, Charsets.US_ASCII);
+  }
+
+  @After
+  public void tearDown() {
+jsc.stop();
+jsc = null;
+path.delete();
+  }
+
+  @Test
+  public void verifyLibSVMDF() {
+dataset = 
jsql.read().format("org.apache.spark.ml.source.libsvm").load(path.getPath());
--- End diff --

Add `option("vectorType", "dense")`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10117][MLLIB] Implement SQL data source...

2015-09-04 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8537#discussion_r38765954
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/source/LibSVMRelationSuite.scala ---
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.source
+
+import java.io.File
+
+import com.google.common.base.Charsets
+import com.google.common.io.Files
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.source.libsvm._
+import org.apache.spark.mllib.linalg.{SparseVector, Vectors, DenseVector}
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.util.Utils
+
+class LibSVMRelationSuite extends SparkFunSuite with MLlibTestSparkContext 
{
+  var path: String = _
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+val lines =
+  """
+|1 1:1.0 3:2.0 5:3.0
+|0
+|0 2:4.0 4:5.0 6:6.0
+  """.stripMargin
+val tempDir = Utils.createTempDir()
+val file = new File(tempDir.getPath, "part-0")
+Files.write(lines, file, Charsets.US_ASCII)
+path = tempDir.toURI.toString
+  }
+
+  test("select as sparse vector") {
+val df = sqlContext.read.options(Map("numFeatures" -> 
"6")).libsvm(path)
+assert(df.columns(0) == "label")
+assert(df.columns(1) == "features")
+val row1 = df.first()
+assert(row1.getDouble(0) == 1.0)
+assert(row1.getAs[SparseVector](1) == Vectors.sparse(6, Seq((0, 1.0), 
(2, 2.0), (4, 3.0
--- End diff --

This doesn't verify the result is a sparse vector because runtime type 
erasure. We need

~~~scala
val v = row1.getAs[SparseVector](1)
assert(v == Vectors.sparse(...))
~~~

to force check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10117][MLLIB] Implement SQL data source...

2015-09-04 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8537#discussion_r38765916
  
--- Diff: 
mllib/src/test/java/org/apache/spark/ml/source/JavaLibSVMRelationSuite.java ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.source;
+
+import com.google.common.base.Charsets;
+import com.google.common.io.Files;
+
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.util.Utils;
+
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.IOException;
+
+/**
+ * Test LibSVMRelation in Java.
+ */
+public class JavaLibSVMRelationSuite {
+  private transient JavaSparkContext jsc;
+  private transient SQLContext jsql;
+  private transient DataFrame dataset;
+
+  private File path;
+
+  @Before
+  public void setUp() throws IOException {
+jsc = new JavaSparkContext("local", "JavaLibSVMRelationSuite");
+jsql = new SQLContext(jsc);
+
+path = Utils.createTempDir(System.getProperty("java.io.tmpdir"), 
"datasource")
+  .getCanonicalFile();
--- End diff --

minor: calling `.getCanonicalFile` and checking `path.exists()` are not 
necessary


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1537 [WiP] Application Timeline Server i...

2015-09-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5423#issuecomment-137789617
  
  [Test build #42005 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42005/consoleFull)
 for   PR 5423 at commit 
[`7a348f5`](https://github.com/apache/spark/commit/7a348f553b6b747d76ceb7f4e51478f875df36b0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9834][MLLIB] implement weighted least s...

2015-09-04 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8588#discussion_r38772511
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala 
---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.optim
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.optim.WeightedLeastSquares.Instance
+import org.apache.spark.mllib.linalg.Vectors
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.mllib.util.TestingUtils._
+import org.apache.spark.rdd.RDD
+
+class WeightedLeastSquaresSuite extends SparkFunSuite with 
MLlibTestSparkContext {
+
+  private var instances: RDD[Instance] = _
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+/*
+   R code:
+
+A <- matrix(c(0, 1, 2, 3, 5, 7, 11, 13), 4, 2)
+b <- c(17, 19, 23, 29)
+w <- c(1, 2, 3, 4)
+ */
+instances = sc.parallelize(Seq(
+  Instance(1.0, Vectors.dense(0.0, 5.0).toSparse, 17.0),
+  Instance(2.0, Vectors.dense(1.0, 7.0), 19.0),
+  Instance(3.0, Vectors.dense(2.0, 11.0), 23.0),
+  Instance(4.0, Vectors.dense(3.0, 13.0), 29.0)
+), 2)
+  }
+
+  test("WLS against lm") {
+/*
+   R code:
+
+df <- as.data.frame(cbind(A, b))
+for (formula in c(b ~ . -1, b ~ .)) {
+  model <- lm(formula, data=df, weights=w)
+  print(as.vector(coef(model)))
+}
+
+[1] -3.727121  3.009983
+[1] 18.08  6.08 -0.60
+ */
+
+val expected = Seq(
+  Vectors.dense(0.0, -3.727121, 3.009983),
+  Vectors.dense(18.08, 6.08, -0.60))
+
+var idx = 0
+for (fitIntercept <- Seq(false, true)) {
+  val wls = new WeightedLeastSquares(
+fitIntercept, regParam = 0.0, standardizeFeatures = false, 
standardizeLabel = false)
--- End diff --

Do we need `standardizeLabel`? I think without regularization, with/without 
standardization will not change the solution. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10301] [SQL] Fixes schema merging for n...

2015-09-04 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/8509#discussion_r38760022
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala
 ---
@@ -160,4 +101,168 @@ private[parquet] object CatalystReadSupport {
   val SPARK_ROW_REQUESTED_SCHEMA = 
"org.apache.spark.sql.parquet.row.requested_schema"
 
   val SPARK_METADATA_KEY = "org.apache.spark.sql.parquet.row.metadata"
+
+  /**
+   * Tailors `parquetSchema` according to `catalystSchema` by removing 
column paths don't exist
+   * in `catalystSchema`, and adding those only exist in `catalystSchema`.
+   */
+  def clipParquetSchema(parquetSchema: MessageType, catalystSchema: 
StructType): MessageType = {
+val clippedParquetFields = 
clipParquetGroupFields(parquetSchema.asGroupType(), catalystSchema)
+Types.buildMessage().addFields(clippedParquetFields: _*).named("root")
+  }
+
+  private def clipParquetType(parquetType: Type, catalystType: DataType): 
Type = {
+catalystType match {
+  case t: ArrayType if !isPrimitiveCatalystType(t.elementType) =>
+// Only clips array types with nested type as element type.
+clipParquetListType(parquetType.asGroupType(), t.elementType)
+
+  case t: MapType if !isPrimitiveCatalystType(t.valueType) =>
+// Only clips map types with nested type as value type.
+clipParquetMapType(parquetType.asGroupType(), t.keyType, 
t.valueType)
+
+  case t: StructType =>
+clipParquetGroup(parquetType.asGroupType(), t)
+
+  case _ =>
+parquetType
+}
+  }
+
+  /**
+   * Whether a Catalyst [[DataType]] is primitive.  Primitive [[DataType]] 
is not equivalent to
+   * [[AtomicType]].  For example, [[CalendarIntervalType]] is primitive, 
but it's not an
+   * [[AtomicType]].
+   */
+  private def isPrimitiveCatalystType(dataType: DataType): Boolean = {
+dataType match {
+  case _: ArrayType | _: MapType | _: StructType => false
+  case _ => true
+}
+  }
+
+  /**
+   * Clips a Parquet [[GroupType]] which corresponds to a Catalyst 
[[ArrayType]].  The element type
+   * of the [[ArrayType]] should also be a nested type, namely an 
[[ArrayType]], a [[MapType]], or a
+   * [[StructType]].
+   */
+  private def clipParquetListType(parquetList: GroupType, elementType: 
DataType): Type = {
+// Precondition of this method, should only be called for lists with 
nested element types.
+assert(!isPrimitiveCatalystType(elementType))
+
+// Unannotated repeated group should be interpreted as required list 
of required element, so
+// list element type is just the group itself.  Clip it.
+if (parquetList.getOriginalType == null && 
parquetList.isRepetition(Repetition.REPEATED)) {
+  clipParquetType(parquetList, elementType)
+} else {
+  assert(
+parquetList.getOriginalType == OriginalType.LIST,
+"Invalid Parquet schema. " +
+  "Original type of annotated Parquet lists must be LIST: " +
+  parquetList.toString)
+
+  assert(
+parquetList.getFieldCount == 1 && 
parquetList.getType(0).isRepetition(Repetition.REPEATED),
+"Invalid Parquet schema. " +
+  "LIST-annotated group should only have exactly one repeated 
field: " +
+  parquetList)
+
+  // Precondition of this method, should only be called for lists with 
nested element types.
+  assert(!parquetList.getType(0).isPrimitive)
+
+  val repeatedGroup = parquetList.getType(0).asGroupType()
+
+  // If the repeated field is a group with multiple fields, or the 
repeated field is a group
+  // with one field and is named either "array" or uses the 
LIST-annotated group's name with
+  // "_tuple" appended then the repeated type is the element type and 
elements are required.
+  // Build a new LIST-annotated group with clipped `repeatedGroup` as 
element type and the
+  // only field.
+  if (
+repeatedGroup.getFieldCount > 1 ||
--- End diff --

This case corresponds to the 2nd rule of LIST backwards-compatibility rules 
defined here: 
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#backward-compatibility-rules


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: 

[GitHub] spark pull request: [SPARK-10446][SQL] Support to specify join typ...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8600#issuecomment-137786977
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10446][SQL] Support to specify join typ...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8600#issuecomment-137786978
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42001/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1537 [WiP] Application Timeline Server i...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5423#issuecomment-137788331
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1537 [WiP] Application Timeline Server i...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5423#issuecomment-137788310
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10301] [SQL] Fixes schema merging for n...

2015-09-04 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/8509#discussion_r38761117
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
 ---
@@ -941,4 +942,313 @@ class ParquetSchemaSuite extends ParquetSchemaTest {
   |  optional fixed_len_byte_array(8) f1 (DECIMAL(18, 3));
   |}
 """.stripMargin)
+
+  private def testSchemaClipping(
+  testName: String,
+  parquetSchema: String,
+  catalystSchema: StructType,
+  expectedSchema: String): Unit = {
+test(s"Clipping - $testName") {
+  val expected = MessageTypeParser.parseMessageType(expectedSchema)
+  val actual = CatalystReadSupport.clipParquetSchema(
+MessageTypeParser.parseMessageType(parquetSchema), catalystSchema)
+
+  try {
+expected.checkContains(actual)
+actual.checkContains(expected)
+  } catch { case cause: Throwable =>
+fail(
+  s"""Expected clipped schema:
+ |$expected
+ |Actual clipped schema:
+ |$actual
+   """.stripMargin,
+  cause)
+  }
+}
+  }
+
+  testSchemaClipping(
+"simple nested struct",
+
+parquetSchema =
+  """message root {
+|  required group f0 {
+|optional int32 f00;
+|optional int32 f01;
+|  }
+|}
+  """.stripMargin,
+
+catalystSchema = {
+  val f0Type = new StructType().add("f00", IntegerType, nullable = 
true)
+  new StructType()
+.add("f0", f0Type, nullable = false)
+.add("f1", IntegerType, nullable = true)
+},
+
+expectedSchema =
+  """message root {
+|  required group f0 {
+|optional int32 f00;
+|  }
+|  optional int32 f1;
+|}
+  """.stripMargin)
+
+  testSchemaClipping(
+"parquet-protobuf style array",
+
+parquetSchema =
+  """message root {
+|  required group f0 {
+|repeated binary f00 (UTF8);
+|repeated group f01 {
+|  optional int32 f010;
+|  optional double f011;
+|}
+|  }
+|}
+  """.stripMargin,
+
+catalystSchema = {
+  val f11Type = new StructType().add("f011", DoubleType, nullable = 
true)
--- End diff --

Yes, thanks! And the variable name is wrong, should be `f01Type`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10437][SQ] Support aggregation expressi...

2015-09-04 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/8599#discussion_r3878
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -561,7 +561,7 @@ class Analyzer(
 }
 
   case sort @ Sort(sortOrder, global, aggregate: Aggregate)
-if aggregate.resolved && !sort.resolved =>
+if aggregate.resolved =>
--- End diff --

Thanks. I've updated it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9834][MLLIB] implement weighted least s...

2015-09-04 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8588#discussion_r38772704
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.optim
+
+import com.github.fommil.netlib.LAPACK.{getInstance => lapack}
+import org.netlib.util.intW
+
+import org.apache.spark.Logging
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.linalg.distributed.RowMatrix
+import org.apache.spark.rdd.RDD
+
+/**
+ * Model fitted by [[WeightedLeastSquares]].
+ * @param coefficients model coefficients
+ * @param intercept model intercept
+ */
+private[ml] class WeightedLeastSquaresModel(
--- End diff --

Will you merge this code into current `LinearRegression.scala`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10117][MLLIB] Implement SQL data source...

2015-09-04 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8537#discussion_r38765752
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala ---
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.source.libsvm
+
+import com.google.common.base.Objects
+
+import org.apache.spark.Logging
+import org.apache.spark.mllib.linalg.VectorUDT
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.util.MLUtils
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.types.{StructType, StructField, DoubleType}
+import org.apache.spark.sql.{Row, SQLContext}
+import org.apache.spark.sql.sources._
+
+/**
+ * LibSVMRelation provides the DataFrame constructed from LibSVM format 
data.
+ * @param path File path of LibSVM format
+ * @param numFeatures The number of features
+ * @param vectorType The type of vector. It can be 'sparse' or 'dense'
+ * @param sqlContext The Spark SQLContext
+ */
+private[ml] class LibSVMRelation(val path: String, val numFeatures: Int, 
val vectorType: String)
+(@transient val sqlContext: SQLContext)
+  extends BaseRelation with TableScan with Logging {
+
+  override def schema: StructType = StructType(
+StructField("label", DoubleType, nullable = false) ::
+  StructField("features", new VectorUDT(), nullable = false) :: Nil
+  )
+
+  override def buildScan(): RDD[Row] = {
+val sc = sqlContext.sparkContext
+val baseRdd = MLUtils.loadLibSVMFile(sc, path, numFeatures)
+
+val rowBuilders = Array(
--- End diff --

Do we need `rowBuilders`? Since we don't have extra optimization, the line 
below should be sufficient.

~~~scala
basedRdd.map(pt => Row(pt.label, pt.features))
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9851] Support submitting map stages ind...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8180#issuecomment-137781454
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9851] Support submitting map stages ind...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8180#issuecomment-137781457
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42004/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-137794798
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-137794776
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10301] [SQL] Fixes schema merging for n...

2015-09-04 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/8509#discussion_r38764680
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala
 ---
@@ -160,4 +101,168 @@ private[parquet] object CatalystReadSupport {
   val SPARK_ROW_REQUESTED_SCHEMA = 
"org.apache.spark.sql.parquet.row.requested_schema"
 
   val SPARK_METADATA_KEY = "org.apache.spark.sql.parquet.row.metadata"
+
+  /**
+   * Tailors `parquetSchema` according to `catalystSchema` by removing 
column paths don't exist
+   * in `catalystSchema`, and adding those only exist in `catalystSchema`.
+   */
+  def clipParquetSchema(parquetSchema: MessageType, catalystSchema: 
StructType): MessageType = {
+val clippedParquetFields = 
clipParquetGroupFields(parquetSchema.asGroupType(), catalystSchema)
+Types.buildMessage().addFields(clippedParquetFields: _*).named("root")
+  }
+
+  private def clipParquetType(parquetType: Type, catalystType: DataType): 
Type = {
+catalystType match {
+  case t: ArrayType if !isPrimitiveCatalystType(t.elementType) =>
+// Only clips array types with nested type as element type.
+clipParquetListType(parquetType.asGroupType(), t.elementType)
+
+  case t: MapType if !isPrimitiveCatalystType(t.valueType) =>
+// Only clips map types with nested type as value type.
+clipParquetMapType(parquetType.asGroupType(), t.keyType, 
t.valueType)
+
+  case t: StructType =>
+clipParquetGroup(parquetType.asGroupType(), t)
+
+  case _ =>
+parquetType
+}
+  }
+
+  /**
+   * Whether a Catalyst [[DataType]] is primitive.  Primitive [[DataType]] 
is not equivalent to
+   * [[AtomicType]].  For example, [[CalendarIntervalType]] is primitive, 
but it's not an
+   * [[AtomicType]].
+   */
+  private def isPrimitiveCatalystType(dataType: DataType): Boolean = {
+dataType match {
+  case _: ArrayType | _: MapType | _: StructType => false
+  case _ => true
+}
+  }
+
+  /**
+   * Clips a Parquet [[GroupType]] which corresponds to a Catalyst 
[[ArrayType]].  The element type
+   * of the [[ArrayType]] should also be a nested type, namely an 
[[ArrayType]], a [[MapType]], or a
+   * [[StructType]].
+   */
+  private def clipParquetListType(parquetList: GroupType, elementType: 
DataType): Type = {
+// Precondition of this method, should only be called for lists with 
nested element types.
+assert(!isPrimitiveCatalystType(elementType))
+
+// Unannotated repeated group should be interpreted as required list 
of required element, so
+// list element type is just the group itself.  Clip it.
+if (parquetList.getOriginalType == null && 
parquetList.isRepetition(Repetition.REPEATED)) {
+  clipParquetType(parquetList, elementType)
+} else {
+  assert(
+parquetList.getOriginalType == OriginalType.LIST,
+"Invalid Parquet schema. " +
+  "Original type of annotated Parquet lists must be LIST: " +
+  parquetList.toString)
+
+  assert(
+parquetList.getFieldCount == 1 && 
parquetList.getType(0).isRepetition(Repetition.REPEATED),
+"Invalid Parquet schema. " +
+  "LIST-annotated group should only have exactly one repeated 
field: " +
+  parquetList)
+
+  // Precondition of this method, should only be called for lists with 
nested element types.
+  assert(!parquetList.getType(0).isPrimitive)
+
+  val repeatedGroup = parquetList.getType(0).asGroupType()
+
+  // If the repeated field is a group with multiple fields, or the 
repeated field is a group
+  // with one field and is named either "array" or uses the 
LIST-annotated group's name with
+  // "_tuple" appended then the repeated type is the element type and 
elements are required.
+  // Build a new LIST-annotated group with clipped `repeatedGroup` as 
element type and the
+  // only field.
+  if (
+repeatedGroup.getFieldCount > 1 ||
--- End diff --

Actually this method is a direct mapping of LIST backwards-compatibility 
rules defined in the link above. But list of primitive types is not handled in 
this method, since we only care about complex element type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-

[GitHub] spark pull request: [SPARK-10117][MLLIB] Implement SQL data source...

2015-09-04 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8537#discussion_r38765932
  
--- Diff: 
mllib/src/test/java/org/apache/spark/ml/source/JavaLibSVMRelationSuite.java ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.source;
+
+import com.google.common.base.Charsets;
+import com.google.common.io.Files;
+
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.util.Utils;
+
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.IOException;
+
+/**
+ * Test LibSVMRelation in Java.
+ */
+public class JavaLibSVMRelationSuite {
+  private transient JavaSparkContext jsc;
+  private transient SQLContext jsql;
+  private transient DataFrame dataset;
+
+  private File path;
+
+  @Before
+  public void setUp() throws IOException {
+jsc = new JavaSparkContext("local", "JavaLibSVMRelationSuite");
+jsql = new SQLContext(jsc);
+
+path = Utils.createTempDir(System.getProperty("java.io.tmpdir"), 
"datasource")
+  .getCanonicalFile();
+if (path.exists()) {
+  path.delete();
+}
+
+String s = "1 1:1.0 3:2.0 5:3.0\n0\n0 2:4.0 4:5.0 6:6.0";
+Files.write(s, path, Charsets.US_ASCII);
+  }
+
+  @After
+  public void tearDown() {
+jsc.stop();
+jsc = null;
+path.delete();
+  }
+
+  @Test
+  public void verifyLibSVMDF() {
+dataset = 
jsql.read().format("org.apache.spark.ml.source.libsvm").load(path.getPath());
+Assert.assertEquals("label", dataset.columns()[0]);
+Assert.assertEquals("features", dataset.columns()[1]);
+Row r = dataset.first();
+Assert.assertEquals(Double.valueOf(r.getDouble(0)), 
Double.valueOf(1.0));
--- End diff --

* `Double.valueOf(...)` is not necessary.
* move `1.0` to the first position


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10117][MLLIB] Implement SQL data source...

2015-09-04 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8537#discussion_r38765957
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/source/LibSVMRelationSuite.scala ---
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.source
+
+import java.io.File
+
+import com.google.common.base.Charsets
+import com.google.common.io.Files
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.source.libsvm._
+import org.apache.spark.mllib.linalg.{SparseVector, Vectors, DenseVector}
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.util.Utils
+
+class LibSVMRelationSuite extends SparkFunSuite with MLlibTestSparkContext 
{
+  var path: String = _
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+val lines =
+  """
+|1 1:1.0 3:2.0 5:3.0
+|0
+|0 2:4.0 4:5.0 6:6.0
+  """.stripMargin
+val tempDir = Utils.createTempDir()
+val file = new File(tempDir.getPath, "part-0")
+Files.write(lines, file, Charsets.US_ASCII)
+path = tempDir.toURI.toString
+  }
+
+  test("select as sparse vector") {
+val df = sqlContext.read.options(Map("numFeatures" -> 
"6")).libsvm(path)
+assert(df.columns(0) == "label")
+assert(df.columns(1) == "features")
+val row1 = df.first()
+assert(row1.getDouble(0) == 1.0)
+assert(row1.getAs[SparseVector](1) == Vectors.sparse(6, Seq((0, 1.0), 
(2, 2.0), (4, 3.0
+  }
+
+  test("select as dense vector") {
+val df = sqlContext.read.options(Map("numFeatures" -> "6", 
"featuresType" -> "dense"))
+  .libsvm(path)
+assert(df.columns(0) == "label")
+assert(df.columns(1) == "features")
+assert(df.count() == 3)
+val row1 = df.first()
+assert(row1.getDouble(0) == 1.0)
+assert(row1.getAs[DenseVector](1) == Vectors.dense(1.0, 0.0, 2.0, 0.0, 
3.0, 0.0))
+  }
+
+  test("select without any option") {
--- End diff --

Should add another test that sets `numFeatures` to a larger number and 
verify it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10117][MLLIB] Implement SQL data source...

2015-09-04 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8537#discussion_r38765946
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/source/LibSVMRelationSuite.scala ---
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.source
+
+import java.io.File
+
+import com.google.common.base.Charsets
+import com.google.common.io.Files
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.source.libsvm._
+import org.apache.spark.mllib.linalg.{SparseVector, Vectors, DenseVector}
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.util.Utils
+
+class LibSVMRelationSuite extends SparkFunSuite with MLlibTestSparkContext 
{
+  var path: String = _
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+val lines =
+  """
+|1 1:1.0 3:2.0 5:3.0
+|0
+|0 2:4.0 4:5.0 6:6.0
+  """.stripMargin
+val tempDir = Utils.createTempDir()
+val file = new File(tempDir.getPath, "part-0")
+Files.write(lines, file, Charsets.US_ASCII)
+path = tempDir.toURI.toString
+  }
+
+  test("select as sparse vector") {
+val df = sqlContext.read.options(Map("numFeatures" -> 
"6")).libsvm(path)
--- End diff --

We can remove `"numFeatures" -> 6` in one test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10117][MLLIB] Implement SQL data source...

2015-09-04 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8537#discussion_r38765935
  
--- Diff: 
mllib/src/test/java/org/apache/spark/ml/source/JavaLibSVMRelationSuite.java ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.source;
+
+import com.google.common.base.Charsets;
+import com.google.common.io.Files;
+
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.util.Utils;
+
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.IOException;
+
+/**
+ * Test LibSVMRelation in Java.
+ */
+public class JavaLibSVMRelationSuite {
+  private transient JavaSparkContext jsc;
+  private transient SQLContext jsql;
+  private transient DataFrame dataset;
+
+  private File path;
+
+  @Before
+  public void setUp() throws IOException {
+jsc = new JavaSparkContext("local", "JavaLibSVMRelationSuite");
+jsql = new SQLContext(jsc);
+
+path = Utils.createTempDir(System.getProperty("java.io.tmpdir"), 
"datasource")
+  .getCanonicalFile();
+if (path.exists()) {
+  path.delete();
+}
+
+String s = "1 1:1.0 3:2.0 5:3.0\n0\n0 2:4.0 4:5.0 6:6.0";
+Files.write(s, path, Charsets.US_ASCII);
+  }
+
+  @After
+  public void tearDown() {
+jsc.stop();
+jsc = null;
+path.delete();
+  }
+
+  @Test
+  public void verifyLibSVMDF() {
+dataset = 
jsql.read().format("org.apache.spark.ml.source.libsvm").load(path.getPath());
+Assert.assertEquals("label", dataset.columns()[0]);
+Assert.assertEquals("features", dataset.columns()[1]);
+Row r = dataset.first();
+Assert.assertEquals(Double.valueOf(r.getDouble(0)), 
Double.valueOf(1.0));
+Assert.assertEquals(r.getAs(1), Vectors.dense(1.0, 0.0, 2.0, 0.0, 3.0, 
0.0));
--- End diff --

We need to check the class name first or cast it to `DenseVector` directly:

~~~java
DenseVector v = r.getAs(1)
Assert.assertEquals(Vectors.dense(...), v)
~~~

If it is a sparse vector, the first line will throw an error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10117][MLLIB] Implement SQL data source...

2015-09-04 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8537#discussion_r38765912
  
--- Diff: 
mllib/src/test/java/org/apache/spark/ml/source/JavaLibSVMRelationSuite.java ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.source;
+
+import com.google.common.base.Charsets;
+import com.google.common.io.Files;
+
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.util.Utils;
+
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.IOException;
--- End diff --

organize imports: java, scala, 3rd-party, spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10437][SQ] Support aggregation expressi...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8599#issuecomment-137778493
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9851] Support submitting map stages ind...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8180#issuecomment-137778525
  
Build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10437][SQ] Support aggregation expressi...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8599#issuecomment-137778526
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9851] Support submitting map stages ind...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8180#issuecomment-137778499
  
 Build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9851] Support submitting map stages ind...

2015-09-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8180#issuecomment-137780768
  
  [Test build #42004 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42004/consoleFull)
 for   PR 8180 at commit 
[`bb5190f`](https://github.com/apache/spark/commit/bb5190f0804f83fd178960bdf0ee857a65312859).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9851] Support submitting map stages ind...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8180#issuecomment-137781690
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42002/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10192] [core] simple test w/ failure in...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8402#issuecomment-137786063
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10192] [core] simple test w/ failure in...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8402#issuecomment-137786073
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41999/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10192] [core] simple test w/ failure in...

2015-09-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8402#issuecomment-137785853
  
  [Test build #41999 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41999/console)
 for   PR 8402 at commit 
[`cfcf4e6`](https://github.com/apache/spark/commit/cfcf4e667121b4225ce327f5f764b00677059865).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10446][SQL] Support to specify join typ...

2015-09-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8600#issuecomment-137786844
  
  [Test build #42001 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42001/console)
 for   PR 8600 at commit 
[`efe069a`](https://github.com/apache/spark/commit/efe069aabfb3b06f2a9884153bb035022265652f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10301] [SQL] Fixes schema merging for n...

2015-09-04 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8509#discussion_r38773863
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
 ---
@@ -229,4 +229,81 @@ class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedSQLContext
   }
 }
   }
+
+  test("SPARK-10301 Clipping nested structs in requested schema") {
--- End diff --

Can we list all cases that are tested at here? Cases that should be here 
will be
* two struct types have the same fields
* two struct types have totally two different set of fields
* one struct type is a super set of another one
* there are some common fields. But, there are also fields that only exist 
in one file. I believe that the ordering of fields is also matter at here. For 
example, for a struct in the global schema, if its fields are `a, b, c, d`, `a, 
d` in local struct and `a, b` in struct filed are two different cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >