[GitHub] spark pull request: Minor fix: made EXPLAIN output to play well ...

2014-06-16 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1097#issuecomment-46251700
  
Thanks. I'm merging this one. The test that failed was a flume test that is 
sometimes flaky. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Follow up of PR #1071 for Java API

2014-06-16 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1085#issuecomment-46252146
  
FYI This didn't get merged into branch-1.0. I did a manual cherry pick.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-17 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-46343296
  
I will test this today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-17 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-46345169
  
This looks good to me. I will merge it.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1293 [SQL] [WIP] Parquet support for nes...

2014-06-17 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/360#discussion_r13876519
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetConverter.scala ---
@@ -0,0 +1,667 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.parquet
+
+import scala.collection.mutable.{Buffer, ArrayBuffer, HashMap}
+
+import parquet.io.api.{PrimitiveConverter, GroupConverter, Binary, 
Converter}
+import parquet.schema.MessageType
+
+import org.apache.spark.sql.catalyst.types._
+import org.apache.spark.sql.catalyst.expressions.{GenericRow, Row, 
Attribute}
+import org.apache.spark.sql.parquet.CatalystConverter.FieldType
+
+/**
+ * Collection of converters of Parquet types (group and primitive types) 
that
+ * model arrays and maps. The conversions are partly based on the 
AvroParquet
+ * converters that are part of Parquet in order to be able to process these
+ * types.
+ *
+ * There are several types of converters:
+ * ul
+ *   li[[org.apache.spark.sql.parquet.CatalystPrimitiveConverter]] for 
primitive
+ *   (numeric, boolean and String) types/li
+ *   li[[org.apache.spark.sql.parquet.CatalystNativeArrayConverter]] for 
arrays
+ *   of native JVM element types; note: currently null values are not 
supported!/li
+ *   li[[org.apache.spark.sql.parquet.CatalystArrayConverter]] for 
arrays of
+ *   arbitrary element types (including nested element types); note: 
currently
+ *   null values are not supported!/li
+ *   li[[org.apache.spark.sql.parquet.CatalystStructConverter]] for 
structs/li
+ *   li[[org.apache.spark.sql.parquet.CatalystMapConverter]] for maps; 
note:
+ *   currently null values are not supported!/li
+ *   li[[org.apache.spark.sql.parquet.CatalystPrimitiveRowConverter]] 
for rows
+ *   of only primitive element types/li
+ *   li[[org.apache.spark.sql.parquet.CatalystGroupConverter]] for other 
nested
+ *   records, including the top-level row record/li
+ * /ul
+ */
+
+private[sql] object CatalystConverter {
+  // The type internally used for fields
+  type FieldType = StructField
+
+  // This is mostly Parquet convention (see, e.g., `ConversionPatterns`).
+  // Note that array for the array elements is chosen by ParquetAvro.
+  // Using a different value will result in Parquet silently dropping 
columns.
+  val ARRAY_ELEMENTS_SCHEMA_NAME = array
+  val MAP_KEY_SCHEMA_NAME = key
+  val MAP_VALUE_SCHEMA_NAME = value
+  val MAP_SCHEMA_NAME = map
+
+  // TODO: consider using Array[T] for arrays to avoid boxing of primitive 
types
+  type ArrayScalaType[T] = Seq[T]
+  type StructScalaType[T] = Seq[T]
+  type MapScalaType[K, V] = Map[K, V]
+
+  protected[parquet] def createConverter(
+  field: FieldType,
+  fieldIndex: Int,
+  parent: CatalystConverter): Converter = {
+val fieldType: DataType = field.dataType
+fieldType match {
+  // For native JVM types we use a converter with native arrays
+  case ArrayType(elementType: NativeType) = {
+new CatalystNativeArrayConverter(elementType, fieldIndex, parent)
+  }
+  // This is for other types of arrays, including those with nested 
fields
+  case ArrayType(elementType: DataType) = {
+new CatalystArrayConverter(elementType, fieldIndex, parent)
+  }
+  case StructType(fields: Seq[StructField]) = {
+new CatalystStructConverter(fields, fieldIndex, parent)
+  }
+  case MapType(keyType: DataType, valueType: DataType) = {
+new CatalystMapConverter(
+  Seq(
+new FieldType(MAP_KEY_SCHEMA_NAME, keyType, false),
+new FieldType(MAP_VALUE_SCHEMA_NAME, valueType, true)),
+fieldIndex,
+parent)
+  }
+  // Strings, Shorts and Bytes do not have a corresponding type in 
Parquet
+  // so we need to treat

[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-17 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-46348842
  
There was a conflict that I had to merge manually. Take a look at master to 
make sure everything is ok. I did compile and ran a couple things.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: spark-submit: add exec at the end of the scrip...

2014-06-17 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/858#issuecomment-46353884
  
Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/999#issuecomment-46363656
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/999#discussion_r13891473
  
--- Diff: docs/sql-programming-guide.md ---
@@ -91,14 +91,33 @@ of its decedents.  To create a basic SQLContext, all 
you need is a SparkContext.
 
 {% highlight python %}
 from pyspark.sql import SQLContext
-sqlCtx = SQLContext(sc)
+sqlContext = SQLContext(sc)
 {% endhighlight %}
 
 /div
 
 /div
 
-## Running SQL on RDDs
+# Data Sources
+
+div class=codetabs
+div data-lang=scala  markdown=1
+Spark SQL supports operating on a variety of data sources though the 
SchemaRDD interface.
--- End diff --

best to put code ... /code around SchemaRDD


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/999#discussion_r13891482
  
--- Diff: docs/sql-programming-guide.md ---
@@ -91,14 +91,33 @@ of its decedents.  To create a basic SQLContext, all 
you need is a SparkContext.
 
 {% highlight python %}
 from pyspark.sql import SQLContext
-sqlCtx = SQLContext(sc)
+sqlContext = SQLContext(sc)
 {% endhighlight %}
 
 /div
 
 /div
 
-## Running SQL on RDDs
+# Data Sources
+
+div class=codetabs
+div data-lang=scala  markdown=1
+Spark SQL supports operating on a variety of data sources though the 
SchemaRDD interface.
--- End diff --

and for Python/Java too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/999#discussion_r13891733
  
--- Diff: docs/sql-programming-guide.md ---
@@ -297,50 +328,152 @@ JavaSchemaRDD teenagers = sqlCtx.sql(SELECT name 
FROM parquetFile WHERE age =
 div data-lang=python  markdown=1
 
 {% highlight python %}
+# sqlContext from the previous example is used in this example.
 
-peopleTable # The SchemaRDD from the previous example.
+schemaPeople # The SchemaRDD from the previous example.
 
 # SchemaRDDs can be saved as Parquet files, maintaining the schema 
information.
-peopleTable.saveAsParquetFile(people.parquet)
+schemaPeople.saveAsParquetFile(people.parquet)
 
 # Read in the Parquet file created above.  Parquet files are 
self-describing so the schema is preserved.
 # The result of loading a parquet file is also a SchemaRDD.
-parquetFile = sqlCtx.parquetFile(people.parquet)
+parquetFile = sqlContext.parquetFile(people.parquet)
 
 # Parquet files can also be registered as tables and then used in SQL 
statements.
 parquetFile.registerAsTable(parquetFile);
-teenagers = sqlCtx.sql(SELECT name FROM parquetFile WHERE age = 13 AND 
age = 19)
-
+teenagers = sqlContext.sql(SELECT name FROM parquetFile WHERE age = 13 
AND age = 19)
+teenNames = teenagers.map(lambda p: Name:  + p.name)
+for teenName in teenNames.collect():
+  print teenName
 {% endhighlight %}
 
 /div
 
 /div
 
-## Writing Language-Integrated Relational Queries
+## JSON Datasets
+div class=codetabs
 
-**Language-Integrated queries are currently only supported in Scala.**
+div data-lang=scala  markdown=1
+Spark SQL can automatically infer the schema of a JSON dataset and load it 
as a SchemaRDD.
+This conversion can be done using one of two methods in a SQLContext:
 
-Spark SQL also supports a domain specific language for writing queries.  
Once again,
-using the data from the above examples:
+* `jsonFile` - loads data from a directory of JSON files where each line 
of the files is a JSON object.
+* `jsonRdd` - loads data from an existing RDD where each element of the 
RDD is a string containing a JSON object.
 
 {% highlight scala %}
+// sc is an existing SparkContext.
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
-import sqlContext._
-val people: RDD[Person] = ... // An RDD of case class objects, from the 
first example.
 
-// The following is the same as 'SELECT name FROM people WHERE age = 10 
AND age = 19'
-val teenagers = people.where('age = 10).where('age = 19).select('name)
+// A JSON dataset is pointed to by path.
+// The path can be either a single text file or a directory storing text 
files.
+val path = examples/src/main/resources/people.json
+// Create a SchemaRDD from the file(s) pointed to by path
+val people = sqlContext.jsonFile(path)
+
+// The inferred schema can be visualized using the printSchema() method.
+people.printSchema()
+// The schema of people is ...
--- End diff --

i'd remove this line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1293 [SQL] [WIP] Parquet support for nes...

2014-06-17 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/360#discussion_r13892570
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetConverter.scala ---
@@ -0,0 +1,667 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.parquet
+
+import scala.collection.mutable.{Buffer, ArrayBuffer, HashMap}
+
+import parquet.io.api.{PrimitiveConverter, GroupConverter, Binary, 
Converter}
+import parquet.schema.MessageType
+
+import org.apache.spark.sql.catalyst.types._
+import org.apache.spark.sql.catalyst.expressions.{GenericRow, Row, 
Attribute}
+import org.apache.spark.sql.parquet.CatalystConverter.FieldType
+
+/**
+ * Collection of converters of Parquet types (group and primitive types) 
that
+ * model arrays and maps. The conversions are partly based on the 
AvroParquet
+ * converters that are part of Parquet in order to be able to process these
+ * types.
+ *
+ * There are several types of converters:
+ * ul
+ *   li[[org.apache.spark.sql.parquet.CatalystPrimitiveConverter]] for 
primitive
+ *   (numeric, boolean and String) types/li
+ *   li[[org.apache.spark.sql.parquet.CatalystNativeArrayConverter]] for 
arrays
+ *   of native JVM element types; note: currently null values are not 
supported!/li
+ *   li[[org.apache.spark.sql.parquet.CatalystArrayConverter]] for 
arrays of
+ *   arbitrary element types (including nested element types); note: 
currently
+ *   null values are not supported!/li
+ *   li[[org.apache.spark.sql.parquet.CatalystStructConverter]] for 
structs/li
+ *   li[[org.apache.spark.sql.parquet.CatalystMapConverter]] for maps; 
note:
+ *   currently null values are not supported!/li
+ *   li[[org.apache.spark.sql.parquet.CatalystPrimitiveRowConverter]] 
for rows
+ *   of only primitive element types/li
+ *   li[[org.apache.spark.sql.parquet.CatalystGroupConverter]] for other 
nested
+ *   records, including the top-level row record/li
+ * /ul
+ */
+
+private[sql] object CatalystConverter {
+  // The type internally used for fields
+  type FieldType = StructField
+
+  // This is mostly Parquet convention (see, e.g., `ConversionPatterns`).
+  // Note that array for the array elements is chosen by ParquetAvro.
+  // Using a different value will result in Parquet silently dropping 
columns.
+  val ARRAY_ELEMENTS_SCHEMA_NAME = array
+  val MAP_KEY_SCHEMA_NAME = key
+  val MAP_VALUE_SCHEMA_NAME = value
+  val MAP_SCHEMA_NAME = map
+
+  // TODO: consider using Array[T] for arrays to avoid boxing of primitive 
types
+  type ArrayScalaType[T] = Seq[T]
+  type StructScalaType[T] = Seq[T]
+  type MapScalaType[K, V] = Map[K, V]
+
+  protected[parquet] def createConverter(
+  field: FieldType,
+  fieldIndex: Int,
+  parent: CatalystConverter): Converter = {
+val fieldType: DataType = field.dataType
+fieldType match {
+  // For native JVM types we use a converter with native arrays
+  case ArrayType(elementType: NativeType) = {
+new CatalystNativeArrayConverter(elementType, fieldIndex, parent)
+  }
+  // This is for other types of arrays, including those with nested 
fields
+  case ArrayType(elementType: DataType) = {
+new CatalystArrayConverter(elementType, fieldIndex, parent)
+  }
+  case StructType(fields: Seq[StructField]) = {
+new CatalystStructConverter(fields, fieldIndex, parent)
+  }
+  case MapType(keyType: DataType, valueType: DataType) = {
+new CatalystMapConverter(
+  Seq(
+new FieldType(MAP_KEY_SCHEMA_NAME, keyType, false),
+new FieldType(MAP_VALUE_SCHEMA_NAME, valueType, true)),
+fieldIndex,
+parent)
+  }
+  // Strings, Shorts and Bytes do not have a corresponding type in 
Parquet
+  // so we need to treat

[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/999#discussion_r13892635
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala 
---
@@ -123,4 +125,53 @@ abstract class QueryPlan[PlanType : 
TreeNode[PlanType]] extends TreeNode[PlanTy
   case other = Nil
 }.toSeq
   }
+
+  protected def generateSchemaTreeString(schema: Seq[Attribute]): String = 
{
+val builder = new StringBuilder
+builder.append(root\n)
+val prefix =  |
+schema.foreach { attribute =
+  val name = attribute.name
+  val dataType = attribute.dataType
+  dataType match {
+case fields: StructType =
+  builder.append(s$prefix-- $name: $StructType\n)
+  generateSchemaTreeString(fields, s$prefix|, builder)
+case ArrayType(fields: StructType) =
+  builder.append(s$prefix-- $name: $ArrayType[$StructType]\n)
+  generateSchemaTreeString(fields, s$prefix|, builder)
+case ArrayType(elementType: DataType) =
+  builder.append(s$prefix-- $name: $ArrayType[$elementType]\n)
+case _ = builder.append(s$prefix-- $name: $dataType\n)
+  }
+}
+
+builder.toString()
+  }
+
+  protected def generateSchemaTreeString(
+  schema: StructType,
+  prefix: String,
+  builder: StringBuilder): StringBuilder = {
+schema.fields.foreach {
+  case StructField(name, fields: StructType, _) =
+builder.append(s$prefix-- $name: $StructType\n)
+generateSchemaTreeString(fields, s$prefix|, builder)
+  case StructField(name, ArrayType(fields: StructType), _) =
+builder.append(s$prefix-- $name: $ArrayType[$StructType]\n)
+generateSchemaTreeString(fields, s$prefix|, builder)
+  case StructField(name, ArrayType(elementType: DataType), _) =
+builder.append(s$prefix-- $name: $ArrayType[$elementType]\n)
+  case StructField(name, fieldType: DataType, _) =
+builder.append(s$prefix-- $name: $fieldType\n)
+}
+
+builder
+  }
+
+  /** Returns the output schema in the tree format. */
+  def schemaTreeString: String = generateSchemaTreeString(output)
--- End diff --

maybe just schemaString


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/999#discussion_r13892709
  
--- Diff: sql/core/pom.xml ---
@@ -54,6 +61,11 @@
   version${parquet.version}/version
 /dependency
 dependency
+  groupIdcom.fasterxml.jackson.core/groupId
+  artifactIdjackson-core/artifactId
+  version2.3.2/version
--- End diff --

@pwendell I think in general sub project pom files don't specify dependency 
versions. Can you verify?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/999#discussion_r13892874
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -99,6 +97,37 @@ class SQLContext(@transient val sparkContext: 
SparkContext)
 new SchemaRDD(this, parquet.ParquetRelation(path))
 
   /**
+   * Loads a JSON file (one object per line), returning the result as a 
[[SchemaRDD]].
--- End diff --

Maybe add a line explaining this goes through the data once to infer the 
schema ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/999#discussion_r13892881
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -99,6 +97,35 @@ class SQLContext(@transient val sparkContext: 
SparkContext)
 new SchemaRDD(this, parquet.ParquetRelation(path))
 
   /**
+   * Loads a JSON file (one object per line), returning the result as a 
[[SchemaRDD]].
+   *
+   * @group userf
+   */
+  def jsonFile(path: String): SchemaRDD = jsonFile(path, 1.0)
+
+  /**
+   * :: Experimental ::
+   */
--- End diff --

here too, although with sampling


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/999#discussion_r13893161
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDD.scala ---
@@ -342,13 +344,34 @@ class SchemaRDD(
   def toJavaSchemaRDD: JavaSchemaRDD = new JavaSchemaRDD(sqlContext, 
logicalPlan)
 
   private[sql] def javaToPython: JavaRDD[Array[Byte]] = {
--- End diff --

add some inline doc explaining this is used for the Python API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/999#discussion_r13893257
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD.scala 
---
@@ -0,0 +1,399 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.json
+
+import scala.collection.JavaConversions._
+import scala.math.BigDecimal
+
+import com.fasterxml.jackson.databind.ObjectMapper
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.analysis.HiveTypeCoercion
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.types._
+import org.apache.spark.sql.execution.{ExistingRdd, SparkLogicalPlan}
+import org.apache.spark.sql.Logging
+
+private[sql] object JsonRDD extends Logging {
+
+  private[sql] def inferSchema(
+  json: RDD[String],
+  samplingRatio: Double = 1.0): LogicalPlan = {
+require(samplingRatio  0)
--- End diff --

add a more meaningful exception message, i.e.
```
require(samplingRatio  0, ssamplingRatio ($samplingRatio) should be 
greater than 0)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/999#discussion_r13893273
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD.scala 
---
@@ -0,0 +1,399 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.json
+
+import scala.collection.JavaConversions._
+import scala.math.BigDecimal
+
+import com.fasterxml.jackson.databind.ObjectMapper
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.analysis.HiveTypeCoercion
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.types._
+import org.apache.spark.sql.execution.{ExistingRdd, SparkLogicalPlan}
+import org.apache.spark.sql.Logging
+
+private[sql] object JsonRDD extends Logging {
+
+  private[sql] def inferSchema(
+  json: RDD[String],
+  samplingRatio: Double = 1.0): LogicalPlan = {
+require(samplingRatio  0)
+val schemaData = if (samplingRatio  0.99) json else 
json.sample(false, samplingRatio, 1)
+
--- End diff --

probably no need to have a blank line for each statement ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/999#issuecomment-46380653
  
This looks to me overall. Only few nitpicks. 

I think we should merge it after you addressed the couple comments I had.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2170: Fix for global name 'PIPE' is not ...

2014-06-17 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1109#issuecomment-46383100
  
Thanks. Merging this in master  branch-1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2170: Fix for global name 'PIPE' is not ...

2014-06-17 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1109#issuecomment-46383668
  
Actually the merge script failed for this pull request. @pwendell any idea?
```
 ./merge_spark_pr.py 
Which pull request would you like to merge? (e.g. 34): 1109

=== Pull Request #1109 ===
title   SPARK-2170: Fix for global name 'PIPE' is not defined
source  gregakespret/spark-ec2-subprocess-script
target  master
url https://api.github.com/repos/apache/spark/pulls/1109

Proceed with merging pull request #1109? (y/n): y
From github.com:apache/spark
 * [new ref] refs/pull/1109/head - PR_TOOL_MERGE_PR_1109
From https://git-wip-us.apache.org/repos/asf/spark
 * [new branch]  master - PR_TOOL_MERGE_PR_1109_MASTER
Switched to branch 'PR_TOOL_MERGE_PR_1109_MASTER'
Automatic merge went well; stopped before committing as requested
Traceback (most recent call last):
  File ./merge_spark_pr.py, line 316, in module
merge_hash = merge_pr(pr_num, target_ref)
  File ./merge_spark_pr.py, line 152, in merge_pr
run_cmd(['git', 'commit', '--author=%s' % primary_author] + 
merge_message_flags)
  File ./merge_spark_pr.py, line 78, in run_cmd
return subprocess.check_output(cmd)
  File /Users/rxin/anaconda/lib/python2.7/subprocess.py, line 573, in 
check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['git', 'commit', '--author=Grega 
Kespret gr...@celtra.com', '-m', uSPARK-2170: Fix for global name 'PIPE' is 
not defined, '-m', 
u'https://issues.apache.org/jira/browse/SPARK-2170\r\n\r\nBefore this fix, when 
running ./spark-ec2 script:\r\n\r\n```\r\nTraceback (most recent call 
last):\r\n  File ./spark_ec2.py, line 894, in module\r\n
main()\r\n  File ./spark_ec2.py, line 886, in main\r\n
real_main()\r\n  File ./spark_ec2.py, line 770, in real_main\r\n
setup_cluster(conn, master_nodes, slave_nodes, opts, True)\r\n  File 
./spark_ec2.py, line 475, in setup_cluster\r\ndot_ssh_tar = 
ssh_read(master, opts, [\'tar\', \'c\', \'.ssh\'])\r\n  File 
./spark_ec2.py, line 709, in ssh_read\r\nssh_command(opts) + 
[\'%s@%s\' % (opts.user, host), stringify_command(command)])\r\n  File 
./spark_ec2.py, line 696, in _check_output\r\nprocess = 
subprocess.Popen(stdout=PIP
 E, *popenargs, **kwargs)\r\nNameError: global name \'PIPE\' is not 
defined\r\n```', '-m', 'Author: Grega Kespret gr...@celtra.com', '-m', 
u'Closes #1109 from gregakespret/spark-ec2-subprocess-script and squashes the 
following commits:', '-m', 4168dc6 [Grega Kespret] Fix for global name 'PIPE' 
is not defined]' returned non-zero exit status 1
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/999#issuecomment-46389105
  
Thanks. I'm merging this in master  branch-1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2170: Fix for global name 'PIPE' is not ...

2014-06-17 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1109#issuecomment-46398768
  
@gregakespret since this has been fixed already in master, do you mind 
closing this pr?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2170: Fix for global name 'PIPE' is not ...

2014-06-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1109#issuecomment-46399205
  
Yup looks like a racing condition (in a good way). Thanks a lot for 
catching this!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Compression should be a setting for individual...

2014-06-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1091#issuecomment-46405754
  
Thanks for working on this, @ScrapCodes. I talked with Matei and while we 
both agree compression would be better set in per-RDD basis, adding another 
boolean flag to StorageLevel is not ideal. 

Matei suggested deferring this and we will come up with a proper design 
later. 

```
We should come up with a proper design for this. I think one viable design 
is to make StorageLevel get constructed via a builder pattern. More generally 
in the future I’d like to have something called StorageStrategy that can also 
convert the data into a format (e.g. columnar or something). Kind of like 
including a serializer in the storage level.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Minor fix

2014-06-18 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1105#discussion_r13903361
  
--- Diff: core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala 
---
@@ -91,8 +91,13 @@ private[spark] object MetadataCleaner {
 conf.set(MetadataCleanerType.systemProperty(cleanerType),  
delay.toString)
   }
 
+  /**
+   * Set the default delay time( in seconds).
--- End diff --

can you put the space before ( instead of after? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2162] Double check in doGetLocal to avo...

2014-06-18 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1103#discussion_r13903517
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -363,6 +363,12 @@ private[spark] class BlockManager(
 val info = blockInfo.get(blockId).orNull
 if (info != null) {
   info.synchronized {
+// Double check to make sure the block is still there, since it
+// might has been removed when we actually come here.
--- End diff --

has - have

also can you point out in the comment that this only works because 
removeBlock also synchronizes on the block info object?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2162] Double check in doGetLocal to avo...

2014-06-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1103#issuecomment-46406863
  
This LGTM actually. Makes sense to do another check within the synchronized 
block in case a block is being removed by another thread. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix for Spark-2151

2014-06-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1095#issuecomment-46407129
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix for Spark-2151

2014-06-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1095#issuecomment-46407125
  
Do you mind updating the pull request title to say something like 
[SPARK-2151] Recognize memory format for spark-submit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2038: rename conf parameters in the sa...

2014-06-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1087#issuecomment-46408239
  
Just leaving a note that this pr has been reverted because changing the 
parameter name in Scala could make the function non-source-compatible anymore 
...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2176][SQL] Extra unnecessary exchange o...

2014-06-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1116#issuecomment-46469732
  
That's not a bad idea. Also we should add more documentation. While Spark 
SQL code in general is extremely concise, it can be hard to understand 
(especially the optimizer rules) to people less familiar with Scala and the 
tree library itself. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2176][SQL] Extra unnecessary exchange o...

2014-06-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1116#issuecomment-46469830
  
Thanks. Merging this in master  branch-1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2162] Double check in doGetLocal to avo...

2014-06-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1103#issuecomment-46470613
  
Thanks. I'm merging this in master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Updated the comment for SPARK-2162.

2014-06-18 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/1117

Updated the comment for SPARK-2162.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-2162

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1117.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1117


commit a4231deb2a480196194fe2b0a819cff60354e3cf
Author: Reynold Xin r...@apache.org
Date:   2014-06-18T18:03:48Z

Updated the comment for SPARK-2162.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2038: rename conf parameters in the sa...

2014-06-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1087#issuecomment-46477205
  
That's a very good idea. We should probably have a API-breaking label on 
JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2177][SQL] describe table result contai...

2014-06-18 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1118#discussion_r13936033
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/hiveOperators.scala 
---
@@ -445,7 +445,19 @@ case class NativeCommand(
 if (sideEffectResult.size == 0) {
   context.emptyResult
 } else {
-  val rows = sideEffectResult.map(r = new GenericRow(Array[Any](r)))
+  // TODO: Need a better way to handle the result of a native command.
+  // We may want to consider to use JsonMetaDataFormatter in Hive.
+  val isDescribe = sql.trim.startsWith(describe)
+  val rows = if (isDescribe) {
+// TODO: If we upgrade Hive to 0.13, we need to check the results 
of
+// context.sessionState.isHiveServerQuery() to determine how to 
split the result.
+// This method is introduced by 
https://issues.apache.org/jira/browse/HIVE-4545.
+// Right now, we split every string by any number of consecutive 
spaces.
+sideEffectResult.map(
+  r = r.split(\\s+)).map(r = new 
GenericRow(r.asInstanceOf[Array[Any]]))
--- End diff --

actually for describe can we only split up to 3 columns?
```scala
scala a b c d e.split(\\s+, 3)
res2: Array[String] = Array(a, b, c d e)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2151] Recognize memory format for spark...

2014-06-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1095#issuecomment-46484800
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Updated the comment for SPARK-2162.

2014-06-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1117#issuecomment-46484824
  
Merged in master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2177][SQL] describe table result contai...

2014-06-18 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1118#discussion_r13937999
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/hiveOperators.scala 
---
@@ -445,7 +445,19 @@ case class NativeCommand(
 if (sideEffectResult.size == 0) {
   context.emptyResult
 } else {
-  val rows = sideEffectResult.map(r = new GenericRow(Array[Any](r)))
+  // TODO: Need a better way to handle the result of a native command.
+  // We may want to consider to use JsonMetaDataFormatter in Hive.
--- End diff --

That sounds good. Let's merge this first and submit another PR for that. 
(Reason is this should make it into 1.0.1)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Remove unicode operator from RDD.scala

2014-06-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1119#issuecomment-46492402
  
@ash211 ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Remove unicode operator from RDD.scala

2014-06-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1119#issuecomment-46500211
  
Thanks. I've merged this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2038: rename conf parameters in the sa...

2014-06-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1087#issuecomment-46500185
  
Yup I added api-breaking label to the ticket.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-06-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/143#issuecomment-46500308
  
That test is flaky and being fixed right now. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2184][SQL] AddExchange isn't idempotent

2014-06-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1122#issuecomment-46512587
  
I'm merging this in master  branch-1.0. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2187] Explain should not run the optimi...

2014-06-18 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/1123

[SPARK-2187] Explain should not run the optimizer twice.

@yhuai @marmbrus

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark explain

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1123.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1123


commit a9d3ba877ce8920f0c4a348c6b32b1f4f7f39427
Author: Reynold Xin r...@apache.org
Date:   2014-06-19T01:19:27Z

[SPARK-2187] Explain should not run the optimizer twice.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2187] Explain should not run the optimi...

2014-06-18 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1123#discussion_r13949881
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala ---
@@ -71,16 +72,24 @@ case class SetCommand(
 }
 
 /**
+ * An explain command for users to see how a command will be executed.
+ *
+ * Note that this command takes in a logical plan, runs the optimizer on 
the logical plan
+ * (but do NOT actually execute it).
+ *
  * :: DeveloperApi ::
  */
 @DeveloperApi
 case class ExplainCommand(
-child: SparkPlan, output: Seq[Attribute])(
+logicalPlan: LogicalPlan, output: Seq[Attribute])(
 @transient context: SQLContext)
-  extends UnaryNode with Command {
+  extends LeafNode with Command {
 
-  // Actually EXPLAIN command doesn't cause any side effect.
-  override protected[sql] lazy val sideEffectResult: Seq[String] = 
this.toString.split(\n)
+  // Run through the optimizer to generate the physical plan.
+  // This is really side effect free but we follow the infrastructure 
anyway...
--- End diff --

I hope so . That was an old comment I just rewrote anyway ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2187] Explain should not run the optimi...

2014-06-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1123#issuecomment-46525610
  
Ok I am merging this in master  branch-1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2177][SQL] describe table result contai...

2014-06-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1118#discussion_r13954580
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -17,7 +17,7 @@
 
 package org.apache.spark.sql.hive
 
-import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.{SQLContext}
--- End diff --

no need to change this 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2177][SQL] describe table result contai...

2014-06-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1118#discussion_r13954638
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/commands.scala
 ---
@@ -60,3 +60,16 @@ case class ExplainCommand(plan: LogicalPlan) extends 
Command {
  * Returned for the CACHE TABLE tableName and UNCACHE TABLE tableName 
command.
  */
 case class CacheCommand(tableName: String, doCache: Boolean) extends 
Command
+
+/**
+ * Returned for the Describe tableName command.
+ */
+case class DescribeCommand(
--- End diff --

would be great to explain isFormatted / isExtended in @param.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2177][SQL] describe table result contai...

2014-06-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1118#discussion_r13954626
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
 ---
@@ -257,6 +250,88 @@ class HiveQuerySuite extends HiveComparisonTest {
 assert(Try(q0.count()).isSuccess)
   }
 
+  test(Describe commands) {
--- End diff --

to be consistent either lowercase D, or uppercase the whole DESCRIBE


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2177][SQL] describe table result contai...

2014-06-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1118#discussion_r13954661
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -81,6 +81,20 @@ private[hive] trait HiveStrategies {
 def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
   case logical.NativeCommand(sql) =
 NativeCommand(sql, plan.output)(context) :: Nil
+  case describe: logical.DescribeCommand = {
+val resolvedTable = context.executePlan(describe.table).analyzed
+resolvedTable match {
+  case t: MetastoreRelation =
+Seq(DescribeHiveTableCommand(
+  t, describe.output, describe.isFormatted, 
describe.isExtended)(context))
+  case o: LogicalPlan =
+if (describe.isFormatted)
--- End diff --

Maybe for non metastore tables, we can just added some formatted/extended 
information saying they are registered as temporary tables? Then we can get rid 
of the extra lines here ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2177][SQL] describe table result contai...

2014-06-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1118#discussion_r13954704
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -362,13 +367,19 @@ private[hive] object HiveQl {
 }
   }
 
+  protected def extractDbNameTableName(tableNameParts: Node): 
(Option[String], String) = {
+val (db, tableName) =
+  tableNameParts.getChildren.map{ case Token(part, Nil) = 
cleanIdentifier(part)} match {
+case Seq(tableOnly) = (None, tableOnly)
+case Seq(databaseName, table) = (Some(databaseName), table)
+  }
+
+(db, tableName)
+  }
+
   protected def nodeToPlan(node: Node): LogicalPlan = node match {
 // Just fake explain for any of the native commands.
-case Token(TOK_EXPLAIN, explainArgs) if nativeCommands contains 
explainArgs.head.getText =
-  ExplainCommand(NoRelation)
-// Create tables aren't native commands due to CTAS queries, but we 
still don't need to
-// explain them.
-case Token(TOK_EXPLAIN, explainArgs) if explainArgs.head.getText == 
TOK_CREATETABLE =
+case Token(TOK_EXPLAIN, explainArgs) if noExplainCommands contains 
explainArgs.head.getText =
--- End diff --

avoid infix contains here, i.e. 
```scala
noExplainCommands.contains(explainArgs.head.getText)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2177][SQL] describe table result contai...

2014-06-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1118#discussion_r13954716
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -362,13 +367,19 @@ private[hive] object HiveQl {
 }
   }
 
+  protected def extractDbNameTableName(tableNameParts: Node): 
(Option[String], String) = {
+val (db, tableName) =
+  tableNameParts.getChildren.map{ case Token(part, Nil) = 
cleanIdentifier(part)} match {
--- End diff --

space after map, and before the closing }


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2177][SQL] describe table result contai...

2014-06-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1118#issuecomment-46534190
  
hmmm a lot of tests are failing because the output doesn't match exactly 
Hive's ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1293 [SQL] [WIP] Parquet support for nes...

2014-06-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/360#issuecomment-46593227
  
That test has been flaky. We are fixing it. 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1293 [SQL] [WIP] Parquet support for nes...

2014-06-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/360#issuecomment-46593240
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2196] [SQL] Fix nullability of CaseWhen...

2014-06-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1133#issuecomment-46611219
  
@concretevitamin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1293 [SQL] [WIP] Parquet support for nes...

2014-06-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/360#issuecomment-46612027
  
@AndreSchumacher do u mind removing the [WIP] tag from the pull request?

Unfortunately due to the avro version bump, we can't include this in 1.0.1. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2180: support HAVING clauses in Hive que...

2014-06-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1136#issuecomment-46612533
  
Any idea why the having test from Hive is not runnable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: A few minor Spark SQL Scaladoc fixes.

2014-06-19 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/1139

A few minor Spark SQL Scaladoc fixes.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark sparksqldoc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1139.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1139


commit 66dc72c49afd8f68e960e6e940b340ac29075fd7
Author: Reynold Xin r...@apache.org
Date:   2014-06-19T21:11:37Z

A few minor Spark SQL Scaladoc fixes.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2191][SQL] Make sure InsertIntoHiveTabl...

2014-06-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1129#issuecomment-46617536
  
I've merged this in master  branch-1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: A few minor Spark SQL Scaladoc fixes.

2014-06-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1139#issuecomment-46619076
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2177][SQL] describe table result contai...

2014-06-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1118#discussion_r14001780
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/commands.scala
 ---
@@ -60,3 +60,23 @@ case class ExplainCommand(plan: LogicalPlan) extends 
Command {
  * Returned for the CACHE TABLE tableName and UNCACHE TABLE tableName 
command.
  */
 case class CacheCommand(tableName: String, doCache: Boolean) extends 
Command
+
+/**
--- End diff --

remove this block


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2177][SQL] describe table result contai...

2014-06-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1118#discussion_r14002250
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala
 ---
@@ -144,6 +144,10 @@ abstract class HiveComparisonTest
   case _: SetCommand = Seq(0)
   case _: LogicalNativeCommand = 
answer.filterNot(nonDeterministicLine).filterNot(_ == )
   case _: ExplainCommand = answer
+  case _: DescribeCommand =
--- End diff --

add some inline comment explaining what you are filtering


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2180: support HAVING clauses in Hive que...

2014-06-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1136#issuecomment-46634535
  
Thanks, @willb. There is at least one problem I found. - I think you'd need 
to add a cast to the having expression. Otherwise try run the following:
```select key, count(*) c from src group by key having c```

In Hive this returns nothing, but in Spark SQL with this patch it throws a 
runtime exception failing to cast integer to boolean.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2180: support HAVING clauses in Hive que...

2014-06-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1136#issuecomment-46635173
  
To be more specific, I think you can always add a cast that cast the having 
expression to boolean, and then we have SimplifyCasts in the optimizer that 
would remove unnecessary casts. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: A few minor Spark SQL Scaladoc fixes.

2014-06-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1139#issuecomment-46636241
  
Ok merging this in master  branch-1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: More minor scaladoc cleanup for Spark SQL.

2014-06-19 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/1142

More minor scaladoc cleanup for Spark SQL.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark sqlclean

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1142.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1142


commit 67a789e9dd8277b5ea3697af4bf07667084ad88a
Author: Reynold Xin r...@apache.org
Date:   2014-06-20T02:21:29Z

More minor scaladoc cleanup for Spark SQL.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-19 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/1143

[SPARK-2209][SQL] Cast shouldn't do null check twice.

Also took the chance to clean up cast a little bit. Too many arrows on each 
line before!


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark cast

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1143.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1143


commit c2b88aee347edab3d36475ef75b30a1d2f15b1c1
Author: Reynold Xin r...@apache.org
Date:   2014-06-20T02:43:06Z

[SPARK-2209][SQL] Cast shouldn't do null check twice.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2180: support HAVING clauses in Hive que...

2014-06-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1136#issuecomment-46642243
  
That's definitely a bug - I will take a look at it later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2180: support HAVING clauses in Hive que...

2014-06-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1136#issuecomment-46644761
  
I found the issue and fixed it. Will push out a pull request soon.

If you can just add the boolean cast (always add it - no need to check if 
the type is already boolean since once I fix the bug, the extra cast on boolean 
value will be removed), that'd be great.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1293 [SQL] [WIP] Parquet support for nes...

2014-06-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/360#issuecomment-46644798
  
That sounds good. If you can just comment that test out for now, that'd be 
great.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: More minor scaladoc cleanup for Spark SQL.

2014-06-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1142#issuecomment-46646143
  
Ok merging this in master  branch-1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2180: support HAVING clauses in Hive que...

2014-06-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1136#issuecomment-46646244
  
Here's the patch: https://github.com/apache/spark/pull/1144


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2210] boolean cast on boolean value sho...

2014-06-19 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/1144

[SPARK-2210] boolean cast on boolean value should be removed.

```
explain select cast(cast(key=0 as boolean) as boolean) aaa from src
```
should be
```
[Physical execution plan:]
[Project [(key#10:0 = 0) AS aaa#7]]
[ HiveTableScan [key#10], (MetastoreRelation default, src, None), None]
```

However, it is currently
```
[Physical execution plan:]
[Project [NOT((key#10=0) = 0) AS aaa#7]]
[ HiveTableScan [key#10], (MetastoreRelation default, src, None), None]
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark booleancast

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1144.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1144


commit c4e543d9802641e4f7ddb2cc2ae08c05962a5b44
Author: Reynold Xin r...@apache.org
Date:   2014-06-20T05:35:23Z

[SPARK-2210] boolean cast on boolean value should be removed.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1143#discussion_r14007468
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -104,85 +121,118 @@ case class Cast(child: Expression, dataType: 
DataType) extends UnaryExpression {
   }
 
   // Timestamp to long, converting milliseconds to seconds
-  private def timestampToLong(ts: Timestamp) = ts.getTime / 1000
+  private[this] def timestampToLong(ts: Timestamp) = ts.getTime / 1000
 
-  private def timestampToDouble(ts: Timestamp) = {
+  private[this] def timestampToDouble(ts: Timestamp) = {
 // First part is the seconds since the beginning of time, followed by 
nanosecs.
 ts.getTime / 1000 + ts.getNanos.toDouble / 10
   }
 
-  def castToLong: Any = Any = child.dataType match {
-case StringType = nullOrCast[String](_, s = try s.toLong catch {
-  case _: NumberFormatException = null
-})
-case BooleanType = nullOrCast[Boolean](_, b = if(b) 1L else 0L)
-case TimestampType = nullOrCast[Timestamp](_, t = timestampToLong(t))
-case DecimalType = nullOrCast[BigDecimal](_, _.toLong)
-case x: NumericType = b = 
x.numeric.asInstanceOf[Numeric[Any]].toLong(b)
-  }
-
-  def castToInt: Any = Any = child.dataType match {
-case StringType = nullOrCast[String](_, s = try s.toInt catch {
-  case _: NumberFormatException = null
-})
-case BooleanType = nullOrCast[Boolean](_, b = if(b) 1 else 0)
-case TimestampType = nullOrCast[Timestamp](_, t = 
timestampToLong(t).toInt)
-case DecimalType = nullOrCast[BigDecimal](_, _.toInt)
-case x: NumericType = b = 
x.numeric.asInstanceOf[Numeric[Any]].toInt(b)
-  }
-
-  def castToShort: Any = Any = child.dataType match {
-case StringType = nullOrCast[String](_, s = try s.toShort catch {
-  case _: NumberFormatException = null
-})
-case BooleanType = nullOrCast[Boolean](_, b = if(b) 1.toShort else 
0.toShort)
-case TimestampType = nullOrCast[Timestamp](_, t = 
timestampToLong(t).toShort)
-case DecimalType = nullOrCast[BigDecimal](_, _.toShort)
-case x: NumericType = b = 
x.numeric.asInstanceOf[Numeric[Any]].toInt(b).toShort
-  }
-
-  def castToByte: Any = Any = child.dataType match {
-case StringType = nullOrCast[String](_, s = try s.toByte catch {
-  case _: NumberFormatException = null
-})
-case BooleanType = nullOrCast[Boolean](_, b = if(b) 1.toByte else 
0.toByte)
-case TimestampType = nullOrCast[Timestamp](_, t = 
timestampToLong(t).toByte)
-case DecimalType = nullOrCast[BigDecimal](_, _.toByte)
-case x: NumericType = b = 
x.numeric.asInstanceOf[Numeric[Any]].toInt(b).toByte
-  }
-
-  def castToDecimal: Any = Any = child.dataType match {
-case StringType = nullOrCast[String](_, s = try 
BigDecimal(s.toDouble) catch {
-  case _: NumberFormatException = null
-})
-case BooleanType = nullOrCast[Boolean](_, b = if(b) BigDecimal(1) 
else BigDecimal(0))
+  private[this] def castToLong: Any = Any = child.dataType match {
+case StringType =
+  buildCast[String](_, s = try s.toLong catch {
+case _: NumberFormatException = null
+  })
--- End diff --

Try is really slow though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1143#discussion_r14007471
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -24,72 +24,89 @@ import org.apache.spark.sql.catalyst.types._
 /** Cast the child expression to the target data type. */
 case class Cast(child: Expression, dataType: DataType) extends 
UnaryExpression {
   override def foldable = child.foldable
-  def nullable = (child.dataType, dataType) match {
+
+  override def nullable = (child.dataType, dataType) match {
 case (StringType, _: NumericType) = true
 case (StringType, TimestampType)  = true
 case _= child.nullable
   }
+
   override def toString = sCAST($child, $dataType)
 
   type EvaluatedType = Any
 
-  def nullOrCast[T](a: Any, func: T = Any): Any = if(a == null) {
-null
-  } else {
-func(a.asInstanceOf[T])
-  }
+  // [[func]] assumes the input is no longer null because eval already 
does the null check.
+  @inline private[this] def buildCast[T](a: Any, func: T = Any): Any = 
func(a.asInstanceOf[T])
 
   // UDFToString
-  def castToString: Any = Any = child.dataType match {
-case BinaryType = nullOrCast[Array[Byte]](_, new String(_, UTF-8))
-case _ = nullOrCast[Any](_, _.toString)
+  private[this] def castToString: Any = Any = child.dataType match {
+case BinaryType = buildCast[Array[Byte]](_, new String(_, UTF-8))
+case _ = buildCast[Any](_, _.toString)
   }
 
   // BinaryConverter
-  def castToBinary: Any = Any = child.dataType match {
-case StringType = nullOrCast[String](_, _.getBytes(UTF-8))
+  private[this] def castToBinary: Any = Any = child.dataType match {
+case StringType = buildCast[String](_, _.getBytes(UTF-8))
   }
 
   // UDFToBoolean
-  def castToBoolean: Any = Any = child.dataType match {
-case StringType = nullOrCast[String](_, _.length() != 0)
-case TimestampType = nullOrCast[Timestamp](_, b = {(b.getTime() != 0 
|| b.getNanos() != 0)})
-case LongType = nullOrCast[Long](_, _ != 0)
-case IntegerType = nullOrCast[Int](_, _ != 0)
-case ShortType = nullOrCast[Short](_, _ != 0)
-case ByteType = nullOrCast[Byte](_, _ != 0)
-case DecimalType = nullOrCast[BigDecimal](_, _ != 0)
-case DoubleType = nullOrCast[Double](_, _ != 0)
-case FloatType = nullOrCast[Float](_, _ != 0)
+  private[this] def castToBoolean: Any = Any = child.dataType match {
+case StringType =
+  buildCast[String](_, _.length() != 0)
+case TimestampType =
+  buildCast[Timestamp](_, b = b.getTime() != 0 || b.getNanos() != 0)
+case LongType =
+  buildCast[Long](_, _ != 0)
+case IntegerType =
+  buildCast[Int](_, _ != 0)
+case ShortType =
+  buildCast[Short](_, _ != 0)
+case ByteType =
+  buildCast[Byte](_, _ != 0)
+case DecimalType =
+  buildCast[BigDecimal](_, _ != 0)
+case DoubleType =
+  buildCast[Double](_, _ != 0)
+case FloatType =
+  buildCast[Float](_, _ != 0)
   }
 
   // TimestampConverter
-  def castToTimestamp: Any = Any = child.dataType match {
-case StringType = nullOrCast[String](_, s = {
-  // Throw away extra if more than 9 decimal places
-  val periodIdx = s.indexOf(.);
-  var n = s
-  if (periodIdx != -1) {
-if (n.length() - periodIdx  9) {
-  n = n.substring(0, periodIdx + 10)
+  private[this] def castToTimestamp: Any = Any = child.dataType match {
+case StringType =
+  buildCast[String](_, s = {
+// Throw away extra if more than 9 decimal places
+val periodIdx = s.indexOf(.)
+var n = s
+if (periodIdx != -1) {
+  if (n.length() - periodIdx  9) {
+n = n.substring(0, periodIdx + 10)
+  }
 }
-  }
-  try Timestamp.valueOf(n) catch { case _: 
java.lang.IllegalArgumentException = null}
-})
-case BooleanType = nullOrCast[Boolean](_, b = new Timestamp((if(b) 1 
else 0) * 1000))
-case LongType = nullOrCast[Long](_, l = new Timestamp(l * 1000))
-case IntegerType = nullOrCast[Int](_, i = new Timestamp(i * 1000))
-case ShortType = nullOrCast[Short](_, s = new Timestamp(s * 1000))
-case ByteType = nullOrCast[Byte](_, b = new Timestamp(b * 1000))
+try Timestamp.valueOf(n) catch { case _: 
java.lang.IllegalArgumentException = null }
+  })
+case BooleanType =
+  buildCast[Boolean](_, b = new Timestamp((if(b) 1 else 0) * 1000))
+case LongType =
+  buildCast[Long](_, l = new Timestamp

[GitHub] spark pull request: [SPARK-2218] rename Equals to EqualsTo in Spar...

2014-06-19 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/1146

[SPARK-2218] rename Equals to EqualsTo in Spark SQL expressions.

Due to the existence of scala.Equals, it is very error prone to name the 
expression Equals, especially because we use a lot of partial functions and 
pattern matching in the optimizer.

Note that this sits on top of #1144.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark equals

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1146.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1146


commit c4e543d9802641e4f7ddb2cc2ae08c05962a5b44
Author: Reynold Xin r...@apache.org
Date:   2014-06-20T05:35:23Z

[SPARK-2210] boolean cast on boolean value should be removed.

commit 81148d16e97d535d9a13927c1be2a9778c6e7ae5
Author: Reynold Xin r...@apache.org
Date:   2014-06-20T05:52:36Z

[SPARK-2218] rename Equals to EqualsTo in Spark SQL expressions.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SparkSQL add SkewJoin

2014-06-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1134#issuecomment-46648420
  
Do you mind reformatting the code to match the Spark coding style?

https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Improve Speed of InsertIntoHiveTable

2014-06-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1130#issuecomment-46648984
  
Merging this in master  branch-1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2177][SQL] describe table result contai...

2014-06-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1118#issuecomment-46649113
  
Ok I'm merging this in master  branch-1.0. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1293 [SQL] Parquet support for nested ty...

2014-06-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/360#issuecomment-46649473
  
Ok I'm going to merge this in master  branch-1.0 now. Kinda scary but the 
change is very isolated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2185] Emit warning when task size excee...

2014-06-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1149#issuecomment-46649825
  
```
error 
file=/home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
 message=File must end with newline character
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2210] cast to boolean on boolean value ...

2014-06-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1144#issuecomment-46650094
  
Ok merging this in master  branch-1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2203: PySpark defaults to use same num r...

2014-06-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1138#issuecomment-46650603
  
Merging this in master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2180: support HAVING clauses in Hive que...

2014-06-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1136#issuecomment-46650863
  
BTW I really want this to go into 1.0.1, which will probably have a release 
candidate soon. So if you have a chance to rebase your PR and add the cast, 
please do. Thanks a lot, @willb!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2196] [SQL] Fix nullability of CaseWhen...

2014-06-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1133#issuecomment-46650936
  
Thanks. I'm merging this in master  branch-1.0. The test failure is not 
related to this change.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1143#issuecomment-46650243
  
Ok merging this in master  branch-1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2218] rename Equals to EqualTo in Spark...

2014-06-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1146#issuecomment-46651438
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2218] rename Equals to EqualTo in Spark...

2014-06-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1146#issuecomment-46652247
  
Ok merging this in master  branch-1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1412][SQL] Disable partial aggregation ...

2014-06-20 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/1152

[SPARK-1412][SQL] Disable partial aggregation automatically when reduction 
factor is low - WIP

This is just a prototype. Kinda ugly, doesn't properly connect with the 
config system yet, and have no test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark partialAggDisable

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1152.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1152


commit 6360e117de927b77a61b5e4f03ac52eb400c1825
Author: Reynold Xin r...@apache.org
Date:   2014-06-20T08:05:22Z

Prototype for disable partial aggregation when we don't see reduction.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1412][SQL] Disable partial aggregation ...

2014-06-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1152#issuecomment-46654388
  
@concretevitamin I find it hard to actually use config options in a 
physical operator. Any suggestions?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1412][SQL] Disable partial aggregation ...

2014-06-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1152#issuecomment-46654585
  
@pwendell / @mateiz should we actually build this into Spark directly (i.e. 
in Aggregator)? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2219][SQL] Fix add jar to execute with ...

2014-06-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1154#issuecomment-46706270
  
This needs to call Spark's addJar, doesn't it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2180: support HAVING clauses in Hive que...

2014-06-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1136#issuecomment-46724012
  
I'm going to merge this in master  branch-1.0. I will create a separate 
ticket to track progress on HAVING. Basically there are two things missing:

1. HAVING without GROUP BY should just become a normal WHERE
2. HAVING should be able to contain aggregate expressions that don't appear 
in the aggregation list. This test contains that: 
https://github.com/apache/hive/blob/trunk/ql/src/test/queries/clientpositive/having.q



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2180: support HAVING clauses in Hive que...

2014-06-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1136#issuecomment-46725494
  
BTW two follow up tickets created:

https://issues.apache.org/jira/browse/SPARK-2225

https://issues.apache.org/jira/browse/SPARK-2226

Let me know if you'd like to work on them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2180: support HAVING clauses in Hive que...

2014-06-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1136#issuecomment-46725451
  
There are databases that support that, and it seems to me a very simple 
change (actually just removing the check code you added is probably enough).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2180: support HAVING clauses in Hive que...

2014-06-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1136#issuecomment-46726272
  
I actually did 2225 already. I will assign 2226 to you. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---



<    4   5   6   7   8   9   10   11   12   13   >