spark git commit: SPARK-5308 [BUILD] MD5 / SHA1 hash format doesn't match standard Maven output

2015-01-27 Thread pwendell
Repository: spark
Updated Branches:
  refs/heads/master 914267484 - ff356e2a2


SPARK-5308 [BUILD] MD5 / SHA1 hash format doesn't match standard Maven output

Here's one way to make the hashes match what Maven's plugins would create. It 
takes a little extra footwork since OS X doesn't have the same command line 
tools. An alternative is just to make Maven output these of course - would that 
be better? I ask in case there is a reason I'm missing, like, we need to hash 
files that Maven doesn't build.

Author: Sean Owen so...@cloudera.com

Closes #4161 from srowen/SPARK-5308 and squashes the following commits:

70d09d0 [Sean Owen] Use $(...) syntax
e25eff8 [Sean Owen] Generate MD5, SHA1 hashes in a format like Maven's plugin


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ff356e2a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ff356e2a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ff356e2a

Branch: refs/heads/master
Commit: ff356e2a21e31998cda3062e560a276a3bfaa7ab
Parents: 9142674
Author: Sean Owen so...@cloudera.com
Authored: Tue Jan 27 10:22:50 2015 -0800
Committer: Patrick Wendell patr...@databricks.com
Committed: Tue Jan 27 10:22:50 2015 -0800

--
 dev/create-release/create-release.sh | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ff356e2a/dev/create-release/create-release.sh
--
diff --git a/dev/create-release/create-release.sh 
b/dev/create-release/create-release.sh
index b1b8cb4..b2a7e09 100755
--- a/dev/create-release/create-release.sh
+++ b/dev/create-release/create-release.sh
@@ -122,8 +122,14 @@ if [[ ! $@ =~ --package-only ]]; then
   for file in $(find . -type f)
   do
 echo $GPG_PASSPHRASE | gpg --passphrase-fd 0 --output $file.asc 
--detach-sig --armour $file;
-gpg --print-md MD5 $file  $file.md5;
-gpg --print-md SHA1 $file  $file.sha1
+if [ $(command -v md5) ]; then
+  # Available on OS X; -q to keep only hash
+  md5 -q $file  $file.md5
+else
+  # Available on Linux; cut to keep only hash
+  md5sum $file | cut -f1 -d' '  $file.md5
+fi
+shasum -a 1 $file | cut -f1 -d' '  $file.sha1
   done
 
   nexus_upload=$NEXUS_ROOT/deployByRepositoryId/$staged_repo_id


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-5419][Mllib] Fix the logic in Vectors.sqdist

2015-01-27 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master d6894b1c5 - 7b0ed7979


[SPARK-5419][Mllib] Fix the logic in Vectors.sqdist

The current implementation in Vectors.sqdist is not efficient because of 
allocating temp arrays. There is also a bug in the code `v1.indices.length / 
v1.size  0.5`. This pr fixes the bug and refactors sqdist without allocating 
new arrays.

Author: Liang-Chi Hsieh vii...@gmail.com

Closes #4217 from viirya/fix_sqdist and squashes the following commits:

e8b0b3d [Liang-Chi Hsieh] For review comments.
314c424 [Liang-Chi Hsieh] Fix sqdist bug.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7b0ed797
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7b0ed797
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7b0ed797

Branch: refs/heads/master
Commit: 7b0ed797958a91cda73baa7aa49ce66bfcb6b64b
Parents: d6894b1
Author: Liang-Chi Hsieh vii...@gmail.com
Authored: Tue Jan 27 01:29:14 2015 -0800
Committer: Xiangrui Meng m...@databricks.com
Committed: Tue Jan 27 01:29:14 2015 -0800

--
 .../org/apache/spark/mllib/linalg/Vectors.scala  | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7b0ed797/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
--
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
index b3022ad..2834ea7 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
@@ -371,18 +371,23 @@ object Vectors {
   squaredDistance += score * score
 }
 
-  case (v1: SparseVector, v2: DenseVector) if v1.indices.length / v1.size 
 0.5 =
+  case (v1: SparseVector, v2: DenseVector) =
 squaredDistance = sqdist(v1, v2)
 
-  case (v1: DenseVector, v2: SparseVector) if v2.indices.length / v2.size 
 0.5 =
+  case (v1: DenseVector, v2: SparseVector) =
 squaredDistance = sqdist(v2, v1)
 
-  // When a SparseVector is approximately dense, we treat it as a 
DenseVector
-  case (v1, v2) =
-squaredDistance = v1.toArray.zip(v2.toArray).foldLeft(0.0){ (distance, 
elems) =
-  val score = elems._1 - elems._2
-  distance + score * score
+  case (DenseVector(vv1), DenseVector(vv2)) =
+var kv = 0
+val sz = vv1.size
+while (kv  sz) {
+  val score = vv1(kv) - vv2(kv)
+  squaredDistance += score * score
+  kv += 1
 }
+  case _ =
+throw new IllegalArgumentException(Do not support vector type  + 
v1.getClass +
+   and  + v2.getClass)
 }
 squaredDistance
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[5/5] spark git commit: [SPARK-5097][SQL] DataFrame

2015-01-27 Thread rxin
[SPARK-5097][SQL] DataFrame

This pull request redesigns the existing Spark SQL dsl, which already provides 
data frame like functionalities.

TODOs:
With the exception of Python support, other tasks can be done in separate, 
follow-up PRs.
- [ ] Audit of the API
- [ ] Documentation
- [ ] More test cases to cover the new API
- [x] Python support
- [ ] Type alias SchemaRDD

Author: Reynold Xin r...@databricks.com
Author: Davies Liu dav...@databricks.com

Closes #4173 from rxin/df1 and squashes the following commits:

0a1a73b [Reynold Xin] Merge branch 'df1' of github.com:rxin/spark into df1
23b4427 [Reynold Xin] Mima.
828f70d [Reynold Xin] Merge pull request #7 from davies/df
257b9e6 [Davies Liu] add repartition
6bf2b73 [Davies Liu] fix collect with UDT and tests
e971078 [Reynold Xin] Missing quotes.
b9306b4 [Reynold Xin] Remove removeColumn/updateColumn for now.
a728bf2 [Reynold Xin] Example rename.
e8aa3d3 [Reynold Xin] groupby - groupBy.
9662c9e [Davies Liu] improve DataFrame Python API
4ae51ea [Davies Liu] python API for dataframe
1e5e454 [Reynold Xin] Fixed a bug with symbol conversion.
2ca74db [Reynold Xin] Couple minor fixes.
ea98ea1 [Reynold Xin] Documentation  literal expressions.
2b22684 [Reynold Xin] Got rid of IntelliJ problems.
02bbfbc [Reynold Xin] Tightening imports.
ffbce66 [Reynold Xin] Fixed compilation error.
59b6d8b [Reynold Xin] Style violation.
b85edfb [Reynold Xin] ALS.
8c37f0a [Reynold Xin] Made MLlib and examples compile
6d53134 [Reynold Xin] Hive module.
d35efd5 [Reynold Xin] Fixed compilation error.
ce4a5d2 [Reynold Xin] Fixed test cases in SQL except ParquetIOSuite.
66d5ef1 [Reynold Xin] SQLContext minor patch.
c9bcdc0 [Reynold Xin] Checkpoint: SQL module compiles!


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/119f45d6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/119f45d6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/119f45d6

Branch: refs/heads/master
Commit: 119f45d61d7b48d376cca05e1b4f0c7fcf65bfa8
Parents: b1b35ca
Author: Reynold Xin r...@databricks.com
Authored: Tue Jan 27 16:08:24 2015 -0800
Committer: Reynold Xin r...@databricks.com
Committed: Tue Jan 27 16:08:24 2015 -0800

--
 .../examples/ml/JavaCrossValidatorExample.java  |  10 +-
 .../examples/ml/JavaSimpleParamsExample.java|  12 +-
 .../JavaSimpleTextClassificationPipeline.java   |  10 +-
 .../apache/spark/examples/sql/JavaSparkSQL.java |  36 +-
 .../src/main/python/mllib/dataset_example.py|   2 +-
 examples/src/main/python/sql.py |  16 +-
 .../examples/ml/CrossValidatorExample.scala |   3 +-
 .../apache/spark/examples/ml/MovieLensALS.scala |   2 +-
 .../spark/examples/ml/SimpleParamsExample.scala |   5 +-
 .../ml/SimpleTextClassificationPipeline.scala   |   3 +-
 .../spark/examples/mllib/DatasetExample.scala   |  28 +-
 .../apache/spark/examples/sql/RDDRelation.scala |   6 +-
 .../scala/org/apache/spark/ml/Estimator.scala   |   8 +-
 .../scala/org/apache/spark/ml/Evaluator.scala   |   4 +-
 .../scala/org/apache/spark/ml/Pipeline.scala|   6 +-
 .../scala/org/apache/spark/ml/Transformer.scala |  17 +-
 .../ml/classification/LogisticRegression.scala  |  14 +-
 .../BinaryClassificationEvaluator.scala |   7 +-
 .../spark/ml/feature/StandardScaler.scala   |  15 +-
 .../apache/spark/ml/recommendation/ALS.scala|  37 +-
 .../apache/spark/ml/tuning/CrossValidator.scala |   8 +-
 .../org/apache/spark/ml/JavaPipelineSuite.java  |   6 +-
 .../JavaLogisticRegressionSuite.java|   8 +-
 .../ml/tuning/JavaCrossValidatorSuite.java  |   4 +-
 .../org/apache/spark/ml/PipelineSuite.scala |  14 +-
 .../LogisticRegressionSuite.scala   |  16 +-
 .../spark/ml/recommendation/ALSSuite.scala  |   4 +-
 .../spark/ml/tuning/CrossValidatorSuite.scala   |   4 +-
 project/MimaExcludes.scala  |  15 +-
 python/pyspark/java_gateway.py  |   7 +-
 python/pyspark/sql.py   | 967 ++-
 python/pyspark/tests.py | 155 +--
 .../analysis/MultiInstanceRelation.scala|   2 +-
 .../catalyst/expressions/namedExpressions.scala |   3 +
 .../spark/sql/catalyst/plans/joinTypes.scala|  15 +
 .../catalyst/plans/logical/TestRelation.scala   |   8 +-
 .../org/apache/spark/sql/CacheManager.scala |   8 +-
 .../scala/org/apache/spark/sql/Column.scala | 528 ++
 .../scala/org/apache/spark/sql/DataFrame.scala  | 596 
 .../org/apache/spark/sql/GroupedDataFrame.scala | 139 +++
 .../scala/org/apache/spark/sql/Literal.scala|  98 ++
 .../scala/org/apache/spark/sql/SQLContext.scala |  85 +-
 .../scala/org/apache/spark/sql/SchemaRDD.scala  | 511 --
 .../org/apache/spark/sql/SchemaRDDLike.scala| 139 ---
 .../main/scala/org/apache/spark/sql/api.scala   | 289 ++
 

[1/5] spark git commit: [SPARK-5097][SQL] DataFrame

2015-01-27 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master b1b35ca2e - 119f45d61


http://git-wip-us.apache.org/repos/asf/spark/blob/119f45d6/sql/core/src/test/scala/org/apache/spark/sql/execution/TgfSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/TgfSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/TgfSuite.scala
deleted file mode 100644
index 272c0d4..000
--- a/sql/core/src/test/scala/org/apache/spark/sql/execution/TgfSuite.scala
+++ /dev/null
@@ -1,65 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the License); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an AS IS BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.execution
-
-import org.apache.spark.sql.QueryTest
-import org.apache.spark.sql.catalyst.expressions._
-import org.apache.spark.sql.catalyst.plans._
-
-/* Implicit conversions */
-import org.apache.spark.sql.test.TestSQLContext._
-
-/**
- * This is an example TGF that uses UnresolvedAttributes 'name and 'age to 
access specific columns
- * from the input data.  These will be replaced during analysis with specific 
AttributeReferences
- * and then bound to specific ordinals during query planning. While TGFs could 
also access specific
- * columns using hand-coded ordinals, doing so violates data independence.
- *
- * Note: this is only a rough example of how TGFs can be expressed, the final 
version will likely
- * involve a lot more sugar for cleaner use in Scala/Java/etc.
- */
-case class ExampleTGF(input: Seq[Expression] = Seq('name, 'age)) extends 
Generator {
-  def children = input
-  protected def makeOutput() = 'nameAndAge.string :: Nil
-
-  val Seq(nameAttr, ageAttr) = input
-
-  override def eval(input: Row): TraversableOnce[Row] = {
-val name = nameAttr.eval(input)
-val age = ageAttr.eval(input).asInstanceOf[Int]
-
-Iterator(
-  new GenericRow(Array[Any](s$name is $age years old)),
-  new GenericRow(Array[Any](sNext year, $name will be ${age + 1} years 
old)))
-  }
-}
-
-class TgfSuite extends QueryTest {
-  val inputData =
-logical.LocalRelation('name.string, 'age.int).loadData(
-  (michael, 29) :: Nil
-)
-
-  test(simple tgf example) {
-checkAnswer(
-  inputData.generate(ExampleTGF()),
-  Seq(
-Row(michael is 29 years old),
-Row(Next year, michael will be 30 years old)))
-  }
-}

http://git-wip-us.apache.org/repos/asf/spark/blob/119f45d6/sql/core/src/test/scala/org/apache/spark/sql/json/JsonSuite.scala
--
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/json/JsonSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/json/JsonSuite.scala
index 94d14ac..ef198f8 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/json/JsonSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/json/JsonSuite.scala
@@ -21,11 +21,12 @@ import java.sql.{Date, Timestamp}
 
 import org.apache.spark.sql.TestData._
 import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.dsl._
 import org.apache.spark.sql.json.JsonRDD.{compatibleType, enforceCorrectType}
 import org.apache.spark.sql.test.TestSQLContext
 import org.apache.spark.sql.test.TestSQLContext._
 import org.apache.spark.sql.types._
-import org.apache.spark.sql.{QueryTest, Row, SQLConf}
+import org.apache.spark.sql.{Literal, QueryTest, Row, SQLConf}
 
 class JsonSuite extends QueryTest {
   import org.apache.spark.sql.json.TestJsonData._
@@ -463,8 +464,8 @@ class JsonSuite extends QueryTest {
 // in the Project.
 checkAnswer(
   jsonSchemaRDD.
-where('num_str  BigDecimal(92233720368547758060)).
-select('num_str + 1.2 as Symbol(num)),
+where('num_str  Literal(BigDecimal(92233720368547758060))).
+select(('num_str + Literal(1.2)).as(num)),
   Row(new java.math.BigDecimal(92233720368547758061.2))
 )
 
@@ -820,7 +821,7 @@ class JsonSuite extends QueryTest {
 
 val schemaRDD1 = applySchema(rowRDD1, schema1)
 schemaRDD1.registerTempTable(applySchema1)
-val schemaRDD2 = schemaRDD1.toSchemaRDD
+val schemaRDD2 = schemaRDD1.toDF
 val result = schemaRDD2.toJSON.collect()
 assert(result(0) == 

[4/5] spark git commit: [SPARK-5097][SQL] DataFrame

2015-01-27 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/119f45d6/python/pyspark/sql.py
--
diff --git a/python/pyspark/sql.py b/python/pyspark/sql.py
index 1990323..7d7550c 100644
--- a/python/pyspark/sql.py
+++ b/python/pyspark/sql.py
@@ -20,15 +20,19 @@ public classes of Spark SQL:
 
 - L{SQLContext}
   Main entry point for SQL functionality.
-- L{SchemaRDD}
+- L{DataFrame}
   A Resilient Distributed Dataset (RDD) with Schema information for the 
data contained. In
-  addition to normal RDD operations, SchemaRDDs also support SQL.
+  addition to normal RDD operations, DataFrames also support SQL.
+- L{GroupedDataFrame}
+- L{Column}
+  Column is a DataFrame with a single column.
 - L{Row}
   A Row of data returned by a Spark SQL query.
 - L{HiveContext}
   Main entry point for accessing data stored in Apache Hive..
 
 
+import sys
 import itertools
 import decimal
 import datetime
@@ -36,6 +40,9 @@ import keyword
 import warnings
 import json
 import re
+import random
+import os
+from tempfile import NamedTemporaryFile
 from array import array
 from operator import itemgetter
 from itertools import imap
@@ -43,6 +50,7 @@ from itertools import imap
 from py4j.protocol import Py4JError
 from py4j.java_collections import ListConverter, MapConverter
 
+from pyspark.context import SparkContext
 from pyspark.rdd import RDD
 from pyspark.serializers import BatchedSerializer, AutoBatchedSerializer, 
PickleSerializer, \
 CloudPickleSerializer, UTF8Deserializer
@@ -54,7 +62,8 @@ __all__ = [
 StringType, BinaryType, BooleanType, DateType, TimestampType, 
DecimalType,
 DoubleType, FloatType, ByteType, IntegerType, LongType,
 ShortType, ArrayType, MapType, StructField, StructType,
-SQLContext, HiveContext, SchemaRDD, Row]
+SQLContext, HiveContext, DataFrame, GroupedDataFrame, Column, 
Row,
+SchemaRDD]
 
 
 class DataType(object):
@@ -1171,7 +1180,7 @@ def _create_cls(dataType):
 
 class Row(tuple):
 
- Row in SchemaRDD 
+ Row in DataFrame 
 __DATATYPE__ = dataType
 __FIELDS__ = tuple(f.name for f in dataType.fields)
 __slots__ = ()
@@ -1198,7 +1207,7 @@ class SQLContext(object):
 
 Main entry point for Spark SQL functionality.
 
-A SQLContext can be used create L{SchemaRDD}, register L{SchemaRDD} as
+A SQLContext can be used create L{DataFrame}, register L{DataFrame} as
 tables, execute SQL over tables, cache tables, and read parquet files.
 
 
@@ -1209,8 +1218,8 @@ class SQLContext(object):
 :param sqlContext: An optional JVM Scala SQLContext. If set, we do not 
instatiate a new
 SQLContext in the JVM, instead we make all calls to this object.
 
- srdd = sqlCtx.inferSchema(rdd)
- sqlCtx.inferSchema(srdd) # doctest: +IGNORE_EXCEPTION_DETAIL
+ df = sqlCtx.inferSchema(rdd)
+ sqlCtx.inferSchema(df) # doctest: +IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 TypeError:...
@@ -1225,12 +1234,12 @@ class SQLContext(object):
  allTypes = sc.parallelize([Row(i=1, s=string, d=1.0, l=1L,
 ... b=True, list=[1, 2, 3], dict={s: 0}, row=Row(a=1),
 ... time=datetime(2014, 8, 1, 14, 1, 5))])
- srdd = sqlCtx.inferSchema(allTypes)
- srdd.registerTempTable(allTypes)
+ df = sqlCtx.inferSchema(allTypes)
+ df.registerTempTable(allTypes)
  sqlCtx.sql('select i+1, d+1, not b, list[1], dict[s], time, 
row.a '
 ...'from allTypes where b and i  0').collect()
 [Row(c0=2, c1=2.0, c2=False, c3=2, c4=0...8, 1, 14, 1, 5), a=1)]
- srdd.map(lambda x: (x.i, x.s, x.d, x.l, x.b, x.time,
+ df.map(lambda x: (x.i, x.s, x.d, x.l, x.b, x.time,
 ... x.row.a, x.list)).collect()
 [(1, u'string', 1.0, 1, True, ...(2014, 8, 1, 14, 1, 5), 1, [1, 2, 3])]
 
@@ -1309,23 +1318,23 @@ class SQLContext(object):
 ... [Row(field1=1, field2=row1),
 ...  Row(field1=2, field2=row2),
 ...  Row(field1=3, field2=row3)])
- srdd = sqlCtx.inferSchema(rdd)
- srdd.collect()[0]
+ df = sqlCtx.inferSchema(rdd)
+ df.collect()[0]
 Row(field1=1, field2=u'row1')
 
  NestedRow = Row(f1, f2)
  nestedRdd1 = sc.parallelize([
 ... NestedRow(array('i', [1, 2]), {row1: 1.0}),
 ... NestedRow(array('i', [2, 3]), {row2: 2.0})])
- srdd = sqlCtx.inferSchema(nestedRdd1)
- srdd.collect()
+ df = sqlCtx.inferSchema(nestedRdd1)
+ df.collect()
 [Row(f1=[1, 2], f2={u'row1': 1.0}), ..., f2={u'row2': 2.0})]
 
  nestedRdd2 = sc.parallelize([
 ... NestedRow([[1, 2], [2, 3]], [1, 2]),
 ... NestedRow([[2, 3], [3, 4]], [2, 3])])
- srdd = sqlCtx.inferSchema(nestedRdd2)
-  

[3/5] spark git commit: [SPARK-5097][SQL] DataFrame

2015-01-27 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/119f45d6/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala
new file mode 100644
index 000..d0bb364
--- /dev/null
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala
@@ -0,0 +1,596 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql
+
+import scala.language.implicitConversions
+import scala.reflect.ClassTag
+import scala.collection.JavaConversions._
+
+import java.util.{ArrayList, List = JList}
+
+import com.fasterxml.jackson.core.JsonFactory
+import net.razorvine.pickle.Pickler
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.rdd.RDD
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.api.python.SerDeUtil
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.sql.catalyst.ScalaReflection
+import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.{Literal = LiteralExpr}
+import org.apache.spark.sql.catalyst.plans.{JoinType, Inner}
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.execution.{LogicalRDD, EvaluatePython}
+import org.apache.spark.sql.json.JsonRDD
+import org.apache.spark.sql.types.{NumericType, StructType}
+import org.apache.spark.util.Utils
+
+
+/**
+ * A collection of rows that have the same columns.
+ *
+ * A [[DataFrame]] is equivalent to a relational table in Spark SQL, and can 
be created using
+ * various functions in [[SQLContext]].
+ * {{{
+ *   val people = sqlContext.parquetFile(...)
+ * }}}
+ *
+ * Once created, it can be manipulated using the various 
domain-specific-language (DSL) functions
+ * defined in: [[DataFrame]] (this class), [[Column]], and [[dsl]] for Scala 
DSL.
+ *
+ * To select a column from the data frame, use the apply method:
+ * {{{
+ *   val ageCol = people(age)  // in Scala
+ *   Column ageCol = people.apply(age)  // in Java
+ * }}}
+ *
+ * Note that the [[Column]] type can also be manipulated through its various 
functions.
+ * {{
+ *   // The following creates a new column that increases everybody's age by 
10.
+ *   people(age) + 10  // in Scala
+ * }}
+ *
+ * A more concrete example:
+ * {{{
+ *   // To create DataFrame using SQLContext
+ *   val people = sqlContext.parquetFile(...)
+ *   val department = sqlContext.parquetFile(...)
+ *
+ *   people.filter(age  30)
+ * .join(department, people(deptId) === department(id))
+ * .groupBy(department(name), gender)
+ * .agg(avg(people(salary)), max(people(age)))
+ * }}}
+ */
+// TODO: Improve documentation.
+class DataFrame protected[sql](
+val sqlContext: SQLContext,
+private val baseLogicalPlan: LogicalPlan,
+operatorsEnabled: Boolean)
+  extends DataFrameSpecificApi with RDDApi[Row] {
+
+  protected[sql] def this(sqlContext: Option[SQLContext], plan: 
Option[LogicalPlan]) =
+this(sqlContext.orNull, plan.orNull, sqlContext.isDefined  
plan.isDefined)
+
+  protected[sql] def this(sqlContext: SQLContext, plan: LogicalPlan) = 
this(sqlContext, plan, true)
+
+  @transient protected[sql] lazy val queryExecution = 
sqlContext.executePlan(baseLogicalPlan)
+
+  @transient protected[sql] val logicalPlan: LogicalPlan = baseLogicalPlan 
match {
+// For various commands (like DDL) and queries with side effects, we force 
query optimization to
+// happen right away to let these side effects take place eagerly.
+case _: Command | _: InsertIntoTable | _: CreateTableAsSelect[_] |_: 
WriteToFile =
+  LogicalRDD(queryExecution.analyzed.output, 
queryExecution.toRdd)(sqlContext)
+case _ =
+  baseLogicalPlan
+  }
+
+  /**
+   * An implicit conversion function internal to this class for us to avoid 
doing
+   * new DataFrame(...) everywhere.
+   */
+  private[this] implicit def toDataFrame(logicalPlan: LogicalPlan): DataFrame 
= {
+new DataFrame(sqlContext, logicalPlan, true)
+  }
+
+  /** Return the list of numeric columns, useful 

spark git commit: SPARK-5199. FS read metrics should support CombineFileSplits and track bytes from all FSs

2015-01-27 Thread pwendell
Repository: spark
Updated Branches:
  refs/heads/master fdaad4eb0 - b1b35ca2e


SPARK-5199. FS read metrics should support CombineFileSplits and track bytes 
from all FSs

...mbineFileSplits

Author: Sandy Ryza sa...@cloudera.com

Closes #4050 from sryza/sandy-spark-5199 and squashes the following commits:

864514b [Sandy Ryza] Add tests and fix bug
0d504f1 [Sandy Ryza] Prettify
915c7e6 [Sandy Ryza] Get metrics from all filesystems
cdbc3e8 [Sandy Ryza] SPARK-5199. Input metrics should show up for InputFormats 
that return CombineFileSplits


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b1b35ca2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b1b35ca2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b1b35ca2

Branch: refs/heads/master
Commit: b1b35ca2e440df40b253bf967bb93705d355c1c0
Parents: fdaad4e
Author: Sandy Ryza sa...@cloudera.com
Authored: Tue Jan 27 15:42:55 2015 -0800
Committer: Patrick Wendell patr...@databricks.com
Committed: Tue Jan 27 15:42:55 2015 -0800

--
 .../apache/spark/deploy/SparkHadoopUtil.scala   | 16 ++--
 .../org/apache/spark/executor/TaskMetrics.scala |  1 -
 .../scala/org/apache/spark/rdd/HadoopRDD.scala  | 12 ++-
 .../org/apache/spark/rdd/NewHadoopRDD.scala | 15 +--
 .../org/apache/spark/rdd/PairRDDFunctions.scala | 11 +--
 .../spark/metrics/InputOutputMetricsSuite.scala | 97 +++-
 6 files changed, 120 insertions(+), 32 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b1b35ca2/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
index 57f9faf..211e3ed 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
@@ -133,10 +133,9 @@ class SparkHadoopUtil extends Logging {
* statistics are only available as of Hadoop 2.5 (see HADOOP-10688).
* Returns None if the required method can't be found.
*/
-  private[spark] def getFSBytesReadOnThreadCallback(path: Path, conf: 
Configuration)
-: Option[() = Long] = {
+  private[spark] def getFSBytesReadOnThreadCallback(): Option[() = Long] = {
 try {
-  val threadStats = getFileSystemThreadStatistics(path, conf)
+  val threadStats = getFileSystemThreadStatistics()
   val getBytesReadMethod = 
getFileSystemThreadStatisticsMethod(getBytesRead)
   val f = () = 
threadStats.map(getBytesReadMethod.invoke(_).asInstanceOf[Long]).sum
   val baselineBytesRead = f()
@@ -156,10 +155,9 @@ class SparkHadoopUtil extends Logging {
* statistics are only available as of Hadoop 2.5 (see HADOOP-10688).
* Returns None if the required method can't be found.
*/
-  private[spark] def getFSBytesWrittenOnThreadCallback(path: Path, conf: 
Configuration)
-: Option[() = Long] = {
+  private[spark] def getFSBytesWrittenOnThreadCallback(): Option[() = Long] = 
{
 try {
-  val threadStats = getFileSystemThreadStatistics(path, conf)
+  val threadStats = getFileSystemThreadStatistics()
   val getBytesWrittenMethod = 
getFileSystemThreadStatisticsMethod(getBytesWritten)
   val f = () = 
threadStats.map(getBytesWrittenMethod.invoke(_).asInstanceOf[Long]).sum
   val baselineBytesWritten = f()
@@ -172,10 +170,8 @@ class SparkHadoopUtil extends Logging {
 }
   }
 
-  private def getFileSystemThreadStatistics(path: Path, conf: Configuration): 
Seq[AnyRef] = {
-val qualifiedPath = path.getFileSystem(conf).makeQualified(path)
-val scheme = qualifiedPath.toUri().getScheme()
-val stats = 
FileSystem.getAllStatistics().filter(_.getScheme().equals(scheme))
+  private def getFileSystemThreadStatistics(): Seq[AnyRef] = {
+val stats = FileSystem.getAllStatistics()
 stats.map(Utils.invoke(classOf[Statistics], _, getThreadStatistics))
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/b1b35ca2/core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala
--
diff --git a/core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala 
b/core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala
index ddb5903..97912c6 100644
--- a/core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala
+++ b/core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala
@@ -19,7 +19,6 @@ package org.apache.spark.executor
 
 import java.util.concurrent.atomic.AtomicLong
 
-import org.apache.spark.executor.DataReadMethod
 import org.apache.spark.executor.DataReadMethod.DataReadMethod
 
 import scala.collection.mutable.ArrayBuffer


[1/2] spark git commit: Preparing development version 1.2.2-SNAPSHOT

2015-01-27 Thread pwendell
Repository: spark
Updated Branches:
  refs/heads/branch-1.2 4026bba79 - 0a16abadc


Preparing development version 1.2.2-SNAPSHOT


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0a16abad
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0a16abad
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0a16abad

Branch: refs/heads/branch-1.2
Commit: 0a16abadc59082b7d3a24d7f3625236658632813
Parents: b77f876
Author: Patrick Wendell patr...@databricks.com
Authored: Wed Jan 28 07:48:55 2015 +
Committer: Patrick Wendell patr...@databricks.com
Committed: Wed Jan 28 07:48:55 2015 +

--
 assembly/pom.xml  | 2 +-
 bagel/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 examples/pom.xml  | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka/pom.xml| 2 +-
 external/mqtt/pom.xml | 2 +-
 external/twitter/pom.xml  | 2 +-
 external/zeromq/pom.xml   | 2 +-
 extras/java8-tests/pom.xml| 2 +-
 extras/kinesis-asl/pom.xml| 2 +-
 extras/spark-ganglia-lgpl/pom.xml | 2 +-
 graphx/pom.xml| 2 +-
 mllib/pom.xml | 2 +-
 network/common/pom.xml| 2 +-
 network/shuffle/pom.xml   | 2 +-
 network/yarn/pom.xml  | 2 +-
 pom.xml   | 2 +-
 repl/pom.xml  | 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 yarn/alpha/pom.xml| 2 +-
 yarn/pom.xml  | 2 +-
 yarn/stable/pom.xml   | 2 +-
 29 files changed, 29 insertions(+), 29 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0a16abad/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index d731003..6889a6c 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent/artifactId
-version1.2.1/version
+version1.2.2-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/0a16abad/bagel/pom.xml
--
diff --git a/bagel/pom.xml b/bagel/pom.xml
index 8374612..f785cf6 100644
--- a/bagel/pom.xml
+++ b/bagel/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent/artifactId
-version1.2.1/version
+version1.2.2-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/0a16abad/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index ceeabd6..9e202f3 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent/artifactId
-version1.2.1/version
+version1.2.2-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/0a16abad/examples/pom.xml
--
diff --git a/examples/pom.xml b/examples/pom.xml
index c7ad4dc..df6975f 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent/artifactId
-version1.2.1/version
+version1.2.2-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/0a16abad/external/flume-sink/pom.xml
--
diff --git a/external/flume-sink/pom.xml b/external/flume-sink/pom.xml
index e0b3eae..0002bf2 100644
--- a/external/flume-sink/pom.xml
+++ b/external/flume-sink/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent/artifactId
-version1.2.1/version
+version1.2.2-SNAPSHOT/version
 relativePath../../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/0a16abad/external/flume/pom.xml
--
diff --git a/external/flume/pom.xml b/external/flume/pom.xml
index f9559c1..e783d39 100644
--- a/external/flume/pom.xml
+++ b/external/flume/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent/artifactId
-version1.2.1/version
+ 

[1/2] spark git commit: Revert Preparing development version 1.2.2-SNAPSHOT

2015-01-27 Thread pwendell
Repository: spark
Updated Branches:
  refs/heads/branch-1.2 fea9b43ef - 4026bba79


Revert Preparing development version 1.2.2-SNAPSHOT

This reverts commit f53a4319ba5f0843c077e64ae5a41e2fac835a5b.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/063a4c50
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/063a4c50
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/063a4c50

Branch: refs/heads/branch-1.2
Commit: 063a4c5037812739ce4a6370543c6e54d8956104
Parents: fea9b43
Author: Patrick Wendell patr...@databricks.com
Authored: Tue Jan 27 23:46:57 2015 -0800
Committer: Patrick Wendell patr...@databricks.com
Committed: Tue Jan 27 23:46:57 2015 -0800

--
 assembly/pom.xml  | 2 +-
 bagel/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 examples/pom.xml  | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka/pom.xml| 2 +-
 external/mqtt/pom.xml | 2 +-
 external/twitter/pom.xml  | 2 +-
 external/zeromq/pom.xml   | 2 +-
 extras/java8-tests/pom.xml| 2 +-
 extras/kinesis-asl/pom.xml| 2 +-
 extras/spark-ganglia-lgpl/pom.xml | 2 +-
 graphx/pom.xml| 2 +-
 mllib/pom.xml | 2 +-
 network/common/pom.xml| 2 +-
 network/shuffle/pom.xml   | 2 +-
 network/yarn/pom.xml  | 2 +-
 pom.xml   | 2 +-
 repl/pom.xml  | 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 yarn/alpha/pom.xml| 2 +-
 yarn/pom.xml  | 2 +-
 yarn/stable/pom.xml   | 2 +-
 29 files changed, 29 insertions(+), 29 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/063a4c50/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 6889a6c..d731003 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent/artifactId
-version1.2.2-SNAPSHOT/version
+version1.2.1/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/063a4c50/bagel/pom.xml
--
diff --git a/bagel/pom.xml b/bagel/pom.xml
index f785cf6..8374612 100644
--- a/bagel/pom.xml
+++ b/bagel/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent/artifactId
-version1.2.2-SNAPSHOT/version
+version1.2.1/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/063a4c50/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index 9e202f3..ceeabd6 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent/artifactId
-version1.2.2-SNAPSHOT/version
+version1.2.1/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/063a4c50/examples/pom.xml
--
diff --git a/examples/pom.xml b/examples/pom.xml
index df6975f..c7ad4dc 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent/artifactId
-version1.2.2-SNAPSHOT/version
+version1.2.1/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/063a4c50/external/flume-sink/pom.xml
--
diff --git a/external/flume-sink/pom.xml b/external/flume-sink/pom.xml
index 0002bf2..e0b3eae 100644
--- a/external/flume-sink/pom.xml
+++ b/external/flume-sink/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent/artifactId
-version1.2.2-SNAPSHOT/version
+version1.2.1/version
 relativePath../../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/063a4c50/external/flume/pom.xml
--
diff --git a/external/flume/pom.xml b/external/flume/pom.xml
index e783d39..f9559c1 100644
--- a/external/flume/pom.xml
+++ b/external/flume/pom.xml
@@ -21,7 +21,7 @@
   parent
 

Git Push Summary

2015-01-27 Thread pwendell
Repository: spark
Updated Tags:  refs/tags/v1.2.1-rc2 [created] b77f87673

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[2/2] spark git commit: Preparing Spark release v1.2.1-rc2

2015-01-27 Thread pwendell
Preparing Spark release v1.2.1-rc2


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b77f8767
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b77f8767
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b77f8767

Branch: refs/heads/branch-1.2
Commit: b77f87673d1f9f03d4c83cf583158227c551359b
Parents: 4026bba
Author: Patrick Wendell patr...@databricks.com
Authored: Wed Jan 28 07:48:55 2015 +
Committer: Patrick Wendell patr...@databricks.com
Committed: Wed Jan 28 07:48:55 2015 +

--
 assembly/pom.xml  | 2 +-
 bagel/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 examples/pom.xml  | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka/pom.xml| 2 +-
 external/mqtt/pom.xml | 2 +-
 external/twitter/pom.xml  | 2 +-
 external/zeromq/pom.xml   | 2 +-
 extras/java8-tests/pom.xml| 2 +-
 extras/kinesis-asl/pom.xml| 2 +-
 extras/spark-ganglia-lgpl/pom.xml | 2 +-
 graphx/pom.xml| 2 +-
 mllib/pom.xml | 2 +-
 network/common/pom.xml| 2 +-
 network/shuffle/pom.xml   | 2 +-
 network/yarn/pom.xml  | 2 +-
 pom.xml   | 2 +-
 repl/pom.xml  | 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 yarn/alpha/pom.xml| 2 +-
 yarn/pom.xml  | 2 +-
 yarn/stable/pom.xml   | 2 +-
 29 files changed, 29 insertions(+), 29 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b77f8767/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 65e3ddf..d731003 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent/artifactId
-version1.2.1-SNAPSHOT/version
+version1.2.1/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/b77f8767/bagel/pom.xml
--
diff --git a/bagel/pom.xml b/bagel/pom.xml
index 4ead7aa..8374612 100644
--- a/bagel/pom.xml
+++ b/bagel/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent/artifactId
-version1.2.1-SNAPSHOT/version
+version1.2.1/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/b77f8767/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index 155b4c9..ceeabd6 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent/artifactId
-version1.2.1-SNAPSHOT/version
+version1.2.1/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/b77f8767/examples/pom.xml
--
diff --git a/examples/pom.xml b/examples/pom.xml
index f5a7ed2..c7ad4dc 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent/artifactId
-version1.2.1-SNAPSHOT/version
+version1.2.1/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/b77f8767/external/flume-sink/pom.xml
--
diff --git a/external/flume-sink/pom.xml b/external/flume-sink/pom.xml
index fe1c8fb..e0b3eae 100644
--- a/external/flume-sink/pom.xml
+++ b/external/flume-sink/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent/artifactId
-version1.2.1-SNAPSHOT/version
+version1.2.1/version
 relativePath../../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/b77f8767/external/flume/pom.xml
--
diff --git a/external/flume/pom.xml b/external/flume/pom.xml
index da4bd70..f9559c1 100644
--- a/external/flume/pom.xml
+++ b/external/flume/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent/artifactId
-version1.2.1-SNAPSHOT/version
+version1.2.1/version
 relativePath../../pom.xml/relativePath
   /parent