date:20161213

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-13 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16030
  
okay, I'll try to fix in that way, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16263: [SPARK-18281][SQL][PySpark] Consumes the returned...

2016-12-13 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16263#discussion_r92337930
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -400,10 +402,19 @@ def toLocalIterator(self):
 
 >>> list(df.toLocalIterator())
 [Row(age=2, name=u'Alice'), Row(age=5, name=u'Bob')]
+>>> it = df.toLocalIterator()
+>>> import time
+>>> time.sleep(5)
+>>> next(it)
--- End diff --

Ok. Thanks. I will update like that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...

2016-12-13 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16142
  
cc @vanzin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16274: [SPARK-18853][SQL] Project (UnaryNode) is way too aggres...

2016-12-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16274
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16274: [SPARK-18853][SQL] Project (UnaryNode) is way too aggres...

2016-12-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16274
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70123/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16274: [SPARK-18853][SQL] Project (UnaryNode) is way too aggres...

2016-12-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16274
  
**[Test build #70123 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70123/consoleFull)**
 for PR 16274 at commit 
[`21570a7`](https://github.com/apache/spark/commit/21570a70572248428390ba0d36b335e1af0f5aa2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16266: [SPARK-18842][TESTS][LAUNCHER] De-duplicate paths in cla...

2016-12-13 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16266
  
@srowen, I think it is ready for a second look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16277: [SPARK-18854][SQL] numberedTreeString and apply(i) incon...

2016-12-13 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16277
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-13 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r92335449
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +58,93 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
+val array = ctx.freshName("array")
 
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val isPrimitiveArray = ctx.isPrimitiveType(et)
+val primitiveTypeName = if (isPrimitiveArray) 
ctx.primitiveTypeName(et) else ""
+val (preprocess, arrayData, arrayWriter) =
+  GenArrayData.getCodeArrayData(ctx, et, children.size, 
isPrimitiveArray, array)
+
+ev.copy(code =
+  preprocess +
   ctx.splitExpressions(
 ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
+evals.zipWithIndex.map { case (eval, i) =>
+  eval.code +
+(if (isPrimitiveArray) {
+  (if (!children(i).nullable) {
+s"\n$arrayWriter.write($i, ${eval.value});"
+  } else {
+s"""
+if (${eval.isNull}) {
--- End diff --

@kiszk I think what @cloud-fan means is that we don't need to check 
`!children(i).nullable` and decide to generate the code of `setNull` or not.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14079: [SPARK-8425][CORE] Application Level Blacklisting

2016-12-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14079
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70122/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14079: [SPARK-8425][CORE] Application Level Blacklisting

2016-12-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14079
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14079: [SPARK-8425][CORE] Application Level Blacklisting

2016-12-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14079
  
**[Test build #70122 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70122/consoleFull)**
 for PR 14079 at commit 
[`c95462f`](https://github.com/apache/spark/commit/c95462fe5c25d37b8658955304f739cc10ccf1f9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility...

2016-12-13 Thread vijoshi

Github user vijoshi commented on a diff in the pull request:

https://github.com/apache/spark/pull/16119#discussion_r92333556
  
--- Diff: python/pyspark/sql/context.py ---
@@ -72,8 +72,13 @@ def __init__(self, sparkContext, sparkSession=None, 
jsqlContext=None):
 self._sc = sparkContext
 self._jsc = self._sc._jsc
 self._jvm = self._sc._jvm
+
 if sparkSession is None:
-sparkSession = SparkSession(sparkContext)
+if sparkContext is SparkContext._active_spark_context:
+sparkSession = SparkSession.builder.getOrCreate()
--- End diff --

okay - I wanted to avoid adding code to the new SparkSession class to 
handle this compatibility issue arising out of the now deprecated SQLContext 
class. Looks like the  python impl of SparkSession builder does not allow a 
SparkContext to be passed in. Do we want to change the public builder interface 
for this ?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-13 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16030

After an offline discussion with @liancheng , here is the result:

**Why does the test fail?**
1. We write a parquet file with schema `[a: long, b: int]` to path
`/data/a=1`.
2. When read it back, we will infer the data schema as `[a: long, b: int]`
and partition schema as `[a: int]`.
3. In
[`HadoopFsRelation.schema`](https://github.com/apache/spark/blob/branch-2.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala#L51-L56),
we merge data schema and partition schema, and announce to users that the
output schema will be `[a: long, b: int]`
4. In
[`FileSourceScanExec`](https://github.com/apache/spark/blob/branch-2.1/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L252-L260),
we build a parquet reader and tell it that the data schema is `[a: long, b:
int]`, the required schema is `[b: int]`, the partition schema is `[a: int]`.
5. In [vectorized parquet
read](https://github.com/apache/spark/blob/branch-2.1/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java#L181-L188),
we read the data from parquet files according to the required schema: `[b:
int]`, and append partition values according to the partition schema: `[a:
int]`, so the schema of the physical row data is: `[b: int, a: int]`
6. In
[`FileSourceStrategy`](https://github.com/apache/spark/blob/branch-2.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala#L74-L76),
we mistakenly think the parquet scan will output data of schema `[b: int, a:
long]`, and read the second column as long type. The vectorized parquet read
can NOT read an int column as long and throw NPE.

**How to fix?**
The root cause is that, when data schema includes partition columns, how to
determine the type of partition columns? Currently, at physical layer(the
reader), we trust the partition schema, which is inferred from directory
strings. At logical layer(`HadoopFsRelation`), we trust the partition columns
inside of data schema. This inconsistency caused the bug.

w.r.t. the fact that we use partition values extracted from directory
strings and ignore the partition columns inside physical data files, we think
it's more reasonable to trust partition schema.

So the fix is simple, update `HadoopFsRelation.schema`, to respect the
partition columns position in data schema, but also respect the partition
columns type in partition schema.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

1 2 3 4 5 >

1 - 100 of 472 matches

Mail list logo