[GitHub] AmplabJenkins removed a comment on issue #23602: [SPARK-26674][CORE]Consolidate CompositeByteBuf when reading large frame

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #23602: [SPARK-26674][CORE]Consolidate 
CompositeByteBuf when reading large frame
URL: https://github.com/apache/spark/pull/23602#issuecomment-464304557
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102409/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23602: [SPARK-26674][CORE]Consolidate CompositeByteBuf when reading large frame

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #23602: [SPARK-26674][CORE]Consolidate 
CompositeByteBuf when reading large frame
URL: https://github.com/apache/spark/pull/23602#issuecomment-464304555
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23602: [SPARK-26674][CORE]Consolidate CompositeByteBuf when reading large frame

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #23602: [SPARK-26674][CORE]Consolidate 
CompositeByteBuf when reading large frame
URL: https://github.com/apache/spark/pull/23602#issuecomment-464304557
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102409/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23602: [SPARK-26674][CORE]Consolidate CompositeByteBuf when reading large frame

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #23602: [SPARK-26674][CORE]Consolidate 
CompositeByteBuf when reading large frame
URL: https://github.com/apache/spark/pull/23602#issuecomment-464304555
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA removed a comment on issue #23602: [SPARK-26674][CORE]Consolidate CompositeByteBuf when reading large frame

2019-02-15 Thread GitBox
SparkQA removed a comment on issue #23602: [SPARK-26674][CORE]Consolidate 
CompositeByteBuf when reading large frame
URL: https://github.com/apache/spark/pull/23602#issuecomment-464278279
 
 
   **[Test build #102409 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102409/testReport)**
 for PR 23602 at commit 
[`3aad18a`](https://github.com/apache/spark/commit/3aad18a4ba96b5717c16ebc8a0d23b0a3986c634).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #23602: [SPARK-26674][CORE]Consolidate CompositeByteBuf when reading large frame

2019-02-15 Thread GitBox
SparkQA commented on issue #23602: [SPARK-26674][CORE]Consolidate 
CompositeByteBuf when reading large frame
URL: https://github.com/apache/spark/pull/23602#issuecomment-464304377
 
 
   **[Test build #102409 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102409/testReport)**
 for PR 23602 at commit 
[`3aad18a`](https://github.com/apache/spark/commit/3aad18a4ba96b5717c16ebc8a0d23b0a3986c634).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon closed pull request #23800: [SPARK-26673][FollowUp][SQL] File source V2: remove duplicated broadcast object in FileWriterFactory

2019-02-15 Thread GitBox
HyukjinKwon closed pull request #23800: [SPARK-26673][FollowUp][SQL] File 
source V2: remove duplicated broadcast object in FileWriterFactory
URL: https://github.com/apache/spark/pull/23800
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on issue #23800: [SPARK-26673][FollowUp][SQL] File source V2: remove duplicated broadcast object in FileWriterFactory

2019-02-15 Thread GitBox
HyukjinKwon commented on issue #23800: [SPARK-26673][FollowUp][SQL] File source 
V2: remove duplicated broadcast object in FileWriterFactory
URL: https://github.com/apache/spark/pull/23800#issuecomment-464301095
 
 
   Merged to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23799: [SPARK-26892]Fix saveAsTextFile throws NullPointerException when null row present

2019-02-15 Thread GitBox
HyukjinKwon commented on a change in pull request #23799: [SPARK-26892]Fix 
saveAsTextFile throws NullPointerException when null row present
URL: https://github.com/apache/spark/pull/23799#discussion_r257449106
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/rdd/RDD.scala
 ##
 @@ -1507,7 +1507,8 @@ abstract class RDD[T: ClassTag](
 val r = this.mapPartitions { iter =>
   val text = new Text()
   iter.map { x =>
-text.set(x.toString)
+val value = if (x != null) x.toString else "Null"
+text.set(value)
 
 Review comment:
   I would simply just add an assert or require.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23797: [WIP][SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs

2019-02-15 Thread GitBox
HyukjinKwon commented on a change in pull request #23797: 
[WIP][SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs
URL: https://github.com/apache/spark/pull/23797#discussion_r257448655
 
 

 ##
 File path: docs/sql-data-sources-avro.md
 ##
 @@ -137,6 +137,37 @@ StreamingQuery query = output
   .option("topic", "topic2")
   .start();
 
+{% endhighlight %}
+
+
+{% highlight python %}
+from pyspark.sql.functions import from_avro, to_avro
+
+# `from_avro` requires Avro schema in JSON string format.
+jsonFormatSchema = open("examples/src/main/resources/user.avsc", "r").read()
+
+df = spark
+  .readStream
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribe", "topic1")
+  .load()
+
+# 1. Decode the Avro data into a struct;
+# 2. Filter by column `favorite_color`;
+# 3. Encode the column `name` in Avro format.
+output = df
+  .select(from_avro("value", jsonFormatSchema).alias("user"))
+  .where("user.favorite_color == \"red\"")
 
 Review comment:
   not a big deal but maybe `'user.favorite_color == "red"'`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23797: [WIP][SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs

2019-02-15 Thread GitBox
HyukjinKwon commented on a change in pull request #23797: 
[WIP][SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs
URL: https://github.com/apache/spark/pull/23797#discussion_r257448637
 
 

 ##
 File path: docs/sql-data-sources-avro.md
 ##
 @@ -137,6 +137,37 @@ StreamingQuery query = output
   .option("topic", "topic2")
   .start();
 
+{% endhighlight %}
+
+
+{% highlight python %}
+from pyspark.sql.functions import from_avro, to_avro
+
+# `from_avro` requires Avro schema in JSON string format.
+jsonFormatSchema = open("examples/src/main/resources/user.avsc", "r").read()
+
+df = spark
+  .readStream
 
 Review comment:
   nit: I think it needs `\` for each line.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23797: [WIP][SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs

2019-02-15 Thread GitBox
HyukjinKwon commented on a change in pull request #23797: 
[WIP][SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs
URL: https://github.com/apache/spark/pull/23797#discussion_r257448673
 
 

 ##
 File path: python/pyspark/sql/functions.py
 ##
 @@ -2402,6 +2402,64 @@ def to_csv(col, options={}):
 return Column(jc)
 
 
+@since(3.0)
+def from_avro(col, jsonFormatSchema, options={}):
+"""
+Converts a binary column of avro format into its corresponding catalyst 
value. The specified
+schema must match the read data, otherwise the behavior is undefined: it 
may fail or return
+arbitrary result.
+
+Avro is built-in but external data source module since Spark 2.4. Please 
deploy the application
+as per the deployment section of "Apache Avro Data Source Guide".
+
+:param data: the binary column.
+:param jsonFormatSchema: the avro schema in JSON string format.
+:param options: options to control how the Avro record is parsed.
+
+>>> from pyspark.sql import Row
+>>> from pyspark.sql.functions import from_avro, to_avro
+>>> data = [(1, Row(name='Alice', age=2))]
+>>> df = spark.createDataFrame(data, ("key", "value"))
+>>> avroDf = df.select(to_avro(df.value).alias("avro"))
+>>> avroDf.collect()
+[Row(avro=bytearray(b'\\x00\\x00\\x04\\x00\\nAlice'))]
+>>> jsonFormatSchema = 
'''{"type":"record","name":"topLevelRecord","fields":
+... 
[{"name":"avro","type":[{"type":"record","name":"value","namespace":"topLevelRecord",
+... "fields":[{"name":"age","type":["long","null"]},
+... {"name":"name","type":["string","null"]}]},"null"]}]}'''
+>>> avroDf.select(from_avro(avroDf.avro, 
jsonFormatSchema).alias("value")).collect()
+[Row(value=Row(avro=Row(age=2, name=u'Alice')))]
+"""
+
+sc = SparkContext._active_spark_context
+jc = 
sc._jvm.org.apache.spark.sql.avro.functions.from_avro(_to_java_column(col),
+   
jsonFormatSchema, options)
 
 Review comment:
   I believe this below complies PEP8.
   
   ```python
   jc = sc._jvm.org.apache.spark.sql.avro.functions.from_avro(
   _to_java_column(col), jsonFormatSchema, options)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HeartSaVioR commented on issue #23706: [SPARK-26790][CORE] Change approach for retrieving executor logs and attributes: self-retrieve

2019-02-15 Thread GitBox
HeartSaVioR commented on issue #23706: [SPARK-26790][CORE] Change approach for 
retrieving executor logs and attributes: self-retrieve
URL: https://github.com/apache/spark/pull/23706#issuecomment-464294044
 
 
   Thanks all for reviewing and merging!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] BryanCutler commented on a change in pull request #23795: [SPARK-26887][SQL][PYTHON] Create datetime.date directly instead of creating datetime64[ns] as intermediate data.

2019-02-15 Thread GitBox
BryanCutler commented on a change in pull request #23795: 
[SPARK-26887][SQL][PYTHON] Create datetime.date directly instead of creating 
datetime64[ns] as intermediate data.
URL: https://github.com/apache/spark/pull/23795#discussion_r257447143
 
 

 ##
 File path: python/pyspark/sql/types.py
 ##
 @@ -1681,38 +1681,53 @@ def from_arrow_schema(arrow_schema):
  for field in arrow_schema])
 
 
-def _check_series_convert_date(series, data_type):
-"""
-Cast the series to datetime.date if it's a date type, otherwise returns 
the original series.
+def _arrow_column_to_pandas(column, data_type):
+""" Convert Arrow Column to pandas Series.
+
+If the given column is a date type column, creates a series of 
datetime.date directly instead
+of creating datetime64[ns] as intermediate data.
 
-:param series: pandas.Series
-:param data_type: a Spark data type for the series
+:param series: pyarrow.lib.Column
+:param data_type: a Spark data type for the column
 """
-import pyarrow
+import pandas as pd
+import pyarrow as pa
 from distutils.version import LooseVersion
-# As of Arrow 0.12.0, date_as_objects is True by default, see ARROW-3910
-if LooseVersion(pyarrow.__version__) < LooseVersion("0.12.0") and 
type(data_type) == DateType:
-return series.dt.date
+# Since Arrow 0.11.0, support date_as_object to return datetime.date 
instead of np.datetime64.
+if LooseVersion(pa.__version__) < LooseVersion("0.11.0"):
+if type(data_type) == DateType:
+return pd.Series(column.to_pylist(), name=column.name)
+else:
+return column.to_pandas()
 else:
-return series
+return column.to_pandas(date_as_object=True)
+
 
+def _arrow_table_to_pandas(table, schema):
+""" Convert Arrow Table to pandas DataFrame.
 
-def _check_dataframe_convert_date(pdf, schema):
-""" Correct date type value to use datetime.date.
+If the given table contains a date type column, use 
`_arrow_column_to_pandas` for pyarrow<0.11
+or use `date_as_object` option for pyarrow>=0.11 to avoid creating 
datetime64[ns] as
+intermediate data.
 
 Pandas DataFrame created from PyArrow uses datetime64[ns] for date type 
values, but we should
 use datetime.date to match the behavior with when Arrow optimization is 
disabled.
 
-:param pdf: pandas.DataFrame
-:param schema: a Spark schema of the pandas.DataFrame
+:param table: pyarrow.lib.Table
+:param schema: a Spark schema of the pyarrow.lib.Table
 """
-import pyarrow
+import pandas as pd
+import pyarrow as pa
 from distutils.version import LooseVersion
-# As of Arrow 0.12.0, date_as_objects is True by default, see ARROW-3910
-if LooseVersion(pyarrow.__version__) < LooseVersion("0.12.0"):
-for field in schema:
-pdf[field.name] = _check_series_convert_date(pdf[field.name], 
field.dataType)
-return pdf
+# Since Arrow 0.11.0, support date_as_object to return datetime.date 
instead of np.datetime64.
+if LooseVersion(pa.__version__) < LooseVersion("0.11.0"):
 
 Review comment:
   It would be nice to bump to 0.12.0 because I think that would allow us to 
clean up the code the most, but since it's a raised error if the user doesn't 
have that version, it might too restrictive. Let's definitely make a JIRA to 
discuss more.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #17968: [SPARK-9792] Make DenseMatrix equality semantical

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #17968: [SPARK-9792] Make DenseMatrix 
equality semantical
URL: https://github.com/apache/spark/pull/17968#issuecomment-464291740
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102408/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #17968: [SPARK-9792] Make DenseMatrix equality semantical

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #17968: [SPARK-9792] Make DenseMatrix 
equality semantical
URL: https://github.com/apache/spark/pull/17968#issuecomment-464291739
 
 
   Build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #17968: [SPARK-9792] Make DenseMatrix equality semantical

2019-02-15 Thread GitBox
SparkQA commented on issue #17968: [SPARK-9792] Make DenseMatrix equality 
semantical
URL: https://github.com/apache/spark/pull/17968#issuecomment-464291581
 
 
   **[Test build #102408 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102408/testReport)**
 for PR 17968 at commit 
[`311c94a`](https://github.com/apache/spark/commit/311c94a3d608b0b86f3ce39415639ec260e5af37).
* This patch **fails Spark unit tests**.
* This patch **does not merge cleanly**.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #17968: [SPARK-9792] Make DenseMatrix equality semantical

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #17968: [SPARK-9792] Make DenseMatrix equality 
semantical
URL: https://github.com/apache/spark/pull/17968#issuecomment-464291739
 
 
   Build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA removed a comment on issue #17968: [SPARK-9792] Make DenseMatrix equality semantical

2019-02-15 Thread GitBox
SparkQA removed a comment on issue #17968: [SPARK-9792] Make DenseMatrix 
equality semantical
URL: https://github.com/apache/spark/pull/17968#issuecomment-464271730
 
 
   **[Test build #102408 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102408/testReport)**
 for PR 17968 at commit 
[`311c94a`](https://github.com/apache/spark/commit/311c94a3d608b0b86f3ce39415639ec260e5af37).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #17968: [SPARK-9792] Make DenseMatrix equality semantical

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #17968: [SPARK-9792] Make DenseMatrix equality 
semantical
URL: https://github.com/apache/spark/pull/17968#issuecomment-464291740
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102408/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] BryanCutler commented on a change in pull request #23795: [SPARK-26887][SQL][PYTHON] Create datetime.date directly instead of creating datetime64[ns] as intermediate data.

2019-02-15 Thread GitBox
BryanCutler commented on a change in pull request #23795: 
[SPARK-26887][SQL][PYTHON] Create datetime.date directly instead of creating 
datetime64[ns] as intermediate data.
URL: https://github.com/apache/spark/pull/23795#discussion_r257446732
 
 

 ##
 File path: python/pyspark/sql/types.py
 ##
 @@ -1681,38 +1681,53 @@ def from_arrow_schema(arrow_schema):
  for field in arrow_schema])
 
 
-def _check_series_convert_date(series, data_type):
-"""
-Cast the series to datetime.date if it's a date type, otherwise returns 
the original series.
+def _arrow_column_to_pandas(column, data_type):
+""" Convert Arrow Column to pandas Series.
+
+If the given column is a date type column, creates a series of 
datetime.date directly instead
+of creating datetime64[ns] as intermediate data.
 
 Review comment:
   It would be nice to say that for dates this will return `datetime.date`, but 
yeah maybe move the part about datetime[64] as intermediate to an internal 
comment.  `_arrow_table_to_pandas` has a comment that the reason for this is to 
match pyspark w/o arrow, but maybe it would be good to add here as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] yucai commented on a change in pull request #21149: [SPARK-24076][SQL] Use different seed in HashAggregate to avoid hash conflict

2019-02-15 Thread GitBox
yucai commented on a change in pull request #21149: [SPARK-24076][SQL] Use 
different seed in HashAggregate to avoid hash conflict
URL: https://github.com/apache/spark/pull/21149#discussion_r257445460
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ##
 @@ -755,7 +755,10 @@ case class HashAggregateExec(
 }
 
 // generate hash code for key
-val hashExpr = Murmur3Hash(groupingExpressions, 42)
+// SPARK-24076: HashAggregate uses the same hash algorithm on the same 
expressions
+// as ShuffleExchange, it may lead to bad hash conflict when 
shuffle.partitions=8192*n,
+// pick a different seed to avoid this conflict
+val hashExpr = Murmur3Hash(groupingExpressions, 48)
 
 Review comment:
   @cloud-fan you mean `unsafeRowKeys.hashCode()`, right?
   I think it is a good idea, unsafe row has [null bit set] etc., the result 
should be different, we don't need weird `48` also. Do you want me to create a 
followup PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #23750: [SPARK-19712][SQL] Pushing Left Semi and Left Anti joins through Project, Aggregate, Window, Union etc.

2019-02-15 Thread GitBox
SparkQA commented on issue #23750: [SPARK-19712][SQL] Pushing Left Semi and 
Left Anti joins through Project, Aggregate, Window, Union etc.
URL: https://github.com/apache/spark/pull/23750#issuecomment-464281896
 
 
   **[Test build #102410 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102410/testReport)**
 for PR 23750 at commit 
[`edfe3d7`](https://github.com/apache/spark/commit/edfe3d7f1771ef72b6eae1e31840aca8b49eebf3).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23750: [SPARK-19712][SQL] Pushing Left Semi and Left Anti joins through Project, Aggregate, Window, Union etc.

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #23750: [SPARK-19712][SQL] Pushing 
Left Semi and Left Anti joins through Project, Aggregate, Window, Union etc.
URL: https://github.com/apache/spark/pull/23750#issuecomment-464281636
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23750: [SPARK-19712][SQL] Pushing Left Semi and Left Anti joins through Project, Aggregate, Window, Union etc.

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #23750: [SPARK-19712][SQL] Pushing Left Semi 
and Left Anti joins through Project, Aggregate, Window, Union etc.
URL: https://github.com/apache/spark/pull/23750#issuecomment-464281636
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23750: [SPARK-19712][SQL] Pushing Left Semi and Left Anti joins through Project, Aggregate, Window, Union etc.

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #23750: [SPARK-19712][SQL] Pushing 
Left Semi and Left Anti joins through Project, Aggregate, Window, Union etc.
URL: https://github.com/apache/spark/pull/23750#issuecomment-464281638
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7990/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23750: [SPARK-19712][SQL] Pushing Left Semi and Left Anti joins through Project, Aggregate, Window, Union etc.

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #23750: [SPARK-19712][SQL] Pushing Left Semi 
and Left Anti joins through Project, Aggregate, Window, Union etc.
URL: https://github.com/apache/spark/pull/23750#issuecomment-464281638
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7990/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] dilipbiswal commented on issue #23780: [SPARK-26864][SQL][BACKPORT-2.4] Query may return incorrect result when python udf is used as a join condition and the udf uses attributes from both leg

2019-02-15 Thread GitBox
dilipbiswal commented on issue #23780: [SPARK-26864][SQL][BACKPORT-2.4] Query 
may return incorrect result when python udf is used as a join condition and the 
udf uses attributes from both legs of left semi join
URL: https://github.com/apache/spark/pull/23780#issuecomment-464281489
 
 
   @cloud-fan Can be merged now ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] dilipbiswal commented on a change in pull request #23750: [SPARK-19712][SQL] Pushing Left Semi and Left Anti joins through Project, Aggregate, Window, Union etc.

2019-02-15 Thread GitBox
dilipbiswal commented on a change in pull request #23750: [SPARK-19712][SQL] 
Pushing Left Semi and Left Anti joins through Project, Aggregate, Window, Union 
etc.
URL: https://github.com/apache/spark/pull/23750#discussion_r257444587
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ##
 @@ -1188,6 +1189,190 @@ object PushDownPredicate extends Rule[LogicalPlan] 
with PredicateHelper {
   }
 }
 
+object PushDownLeftSemiAntiJoin extends Rule[LogicalPlan] with PredicateHelper 
{
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+// Similar to the above Filter over Project
+// LeftSemi/LeftAnti over Project
+case join @ Join(p @ Project(pList, gChild), rightOp, 
LeftSemiOrAnti(joinType), joinCond, hint)
+  if pList.forall(_.deterministic) && 
!ScalarSubquery.hasScalarSubquery(pList) &&
+canPushThroughCondition(Seq(gChild), joinCond, rightOp) =>
+  if (joinCond.isEmpty) {
+// No join condition, just push down the Join below Project
+Project(pList, Join(gChild, rightOp, joinType, joinCond, hint))
+  } else {
+// Create a map of Aliases to their values from the child projection.
+// e.g., 'SELECT a + b AS c, d ...' produces Map(c -> a + b).
+val aliasMap = AttributeMap(pList.collect {
+  case a: Alias => (a.toAttribute, a.child)
+})
+val newJoinCond = if (aliasMap.nonEmpty) {
+  Option(replaceAlias(joinCond.get, aliasMap))
+} else {
+  joinCond
+}
+Project(pList, Join(gChild, rightOp, joinType, newJoinCond, hint))
+  }
+
+// Similar to the above Filter over Aggregate
+// LeftSemi/LeftAnti over Aggregate
+case join @ Join(aggregate: Aggregate, rightOp, LeftSemiOrAnti(joinType), 
joinCond, hint)
+  if aggregate.aggregateExpressions.forall(_.deterministic)
+&& aggregate.groupingExpressions.nonEmpty =>
+  if (joinCond.isEmpty) {
+// No join condition, just push down Join below Aggregate
+aggregate.copy(child = Join(aggregate.child, rightOp, joinType, 
joinCond, hint))
+  } else {
+// Find all the aliased expressions in the aggregate list that don't 
include any actual
+// AggregateExpression, and create a map from the alias to the 
expression
+val aliasMap = AttributeMap(aggregate.aggregateExpressions.collect {
+  case a: Alias if 
a.child.find(_.isInstanceOf[AggregateExpression]).isEmpty =>
+(a.toAttribute, a.child)
+})
+
+// For each join condition, expand the alias and
+// check if the condition can be evaluated using
+// attributes produced by the aggregate operator's child operator.
+
+val (pushDown, stayUp) = 
splitConjunctivePredicates(joinCond.get).partition { cond =>
+  val replaced = replaceAlias(cond, aliasMap)
+  cond.references.nonEmpty &&
 
 Review comment:
@maropu Thanks for reviewing. I have addressed your comments. Please look 
through it when you get a chance. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23804: [WIP][SPARK-26896] JDK 11 module adjustments for running tests

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #23804: [WIP][SPARK-26896] JDK 11 module 
adjustments for running tests
URL: https://github.com/apache/spark/pull/23804#issuecomment-464279496
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102404/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23804: [WIP][SPARK-26896] JDK 11 module adjustments for running tests

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #23804: [WIP][SPARK-26896] JDK 11 
module adjustments for running tests
URL: https://github.com/apache/spark/pull/23804#issuecomment-464279496
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102404/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hadoop 3 profile

2019-02-15 Thread GitBox
HyukjinKwon commented on issue #21588: [SPARK-24590][BUILD] Make Jenkins tests 
passed with hadoop 3 profile
URL: https://github.com/apache/spark/pull/21588#issuecomment-464279463
 
 
   ping for what? Hive upgrade is in progress which blocks this PR 
https://github.com/apache/spark/pull/23788
   
   Please give inputs here and the discussion thread @wangyum pointed out above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23804: [WIP][SPARK-26896] JDK 11 module adjustments for running tests

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #23804: [WIP][SPARK-26896] JDK 11 module 
adjustments for running tests
URL: https://github.com/apache/spark/pull/23804#issuecomment-464279495
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23804: [WIP][SPARK-26896] JDK 11 module adjustments for running tests

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #23804: [WIP][SPARK-26896] JDK 11 
module adjustments for running tests
URL: https://github.com/apache/spark/pull/23804#issuecomment-464279495
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA removed a comment on issue #23804: [WIP][SPARK-26896] JDK 11 module adjustments for running tests

2019-02-15 Thread GitBox
SparkQA removed a comment on issue #23804: [WIP][SPARK-26896] JDK 11 module 
adjustments for running tests
URL: https://github.com/apache/spark/pull/23804#issuecomment-464227566
 
 
   **[Test build #102404 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102404/testReport)**
 for PR 23804 at commit 
[`16caf67`](https://github.com/apache/spark/commit/16caf6733c893204fab2df4603c7abf0c3106bf7).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #23804: [WIP][SPARK-26896] JDK 11 module adjustments for running tests

2019-02-15 Thread GitBox
SparkQA commented on issue #23804: [WIP][SPARK-26896] JDK 11 module adjustments 
for running tests
URL: https://github.com/apache/spark/pull/23804#issuecomment-464279361
 
 
   **[Test build #102404 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102404/testReport)**
 for PR 23804 at commit 
[`16caf67`](https://github.com/apache/spark/commit/16caf6733c893204fab2df4603c7abf0c3106bf7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #23602: [SPARK-26674][CORE]Consolidate CompositeByteBuf when reading large frame

2019-02-15 Thread GitBox
SparkQA commented on issue #23602: [SPARK-26674][CORE]Consolidate 
CompositeByteBuf when reading large frame
URL: https://github.com/apache/spark/pull/23602#issuecomment-464278279
 
 
   **[Test build #102409 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102409/testReport)**
 for PR 23602 at commit 
[`3aad18a`](https://github.com/apache/spark/commit/3aad18a4ba96b5717c16ebc8a0d23b0a3986c634).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23602: [SPARK-26674][CORE]Consolidate CompositeByteBuf when reading large frame

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #23602: [SPARK-26674][CORE]Consolidate 
CompositeByteBuf when reading large frame
URL: https://github.com/apache/spark/pull/23602#issuecomment-464278156
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7989/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23602: [SPARK-26674][CORE]Consolidate CompositeByteBuf when reading large frame

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #23602: [SPARK-26674][CORE]Consolidate 
CompositeByteBuf when reading large frame
URL: https://github.com/apache/spark/pull/23602#issuecomment-464278155
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23602: [SPARK-26674][CORE]Consolidate CompositeByteBuf when reading large frame

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #23602: [SPARK-26674][CORE]Consolidate 
CompositeByteBuf when reading large frame
URL: https://github.com/apache/spark/pull/23602#issuecomment-464278155
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23602: [SPARK-26674][CORE]Consolidate CompositeByteBuf when reading large frame

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #23602: [SPARK-26674][CORE]Consolidate 
CompositeByteBuf when reading large frame
URL: https://github.com/apache/spark/pull/23602#issuecomment-464278156
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7989/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SongYadong commented on issue #23794: [SPARK-26884][CORE] Let task acquire memory accurately when using spilled memory

2019-02-15 Thread GitBox
SongYadong commented on issue #23794: [SPARK-26884][CORE] Let task acquire 
memory accurately when using spilled memory
URL: https://github.com/apache/spark/pull/23794#issuecomment-464275258
 
 
   Thanks for review. It's right the memory manager will try to give the right 
amount. But when going to spill action, that is to say memory manager probably 
can't give needed memory now. If we acquire unsatisfied amount after spilling ( 
when `released` < `required - got`), memory manager will try redundant effort 
to get memory, even be blocked temporarily. By accurate control, I think 
acquiring memory may return fast.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] liupc commented on a change in pull request #23602: [SPARK-26674][CORE]Consolidate CompositeByteBuf when reading large frame

2019-02-15 Thread GitBox
liupc commented on a change in pull request #23602: 
[SPARK-26674][CORE]Consolidate CompositeByteBuf when reading large frame
URL: https://github.com/apache/spark/pull/23602#discussion_r257441590
 
 

 ##
 File path: 
common/network-common/src/main/java/org/apache/spark/network/util/TransportFrameDecoder.java
 ##
 @@ -123,30 +140,54 @@ private long decodeFrameSize() {
 
   private ByteBuf decodeNext() {
 long frameSize = decodeFrameSize();
-if (frameSize == UNKNOWN_FRAME_SIZE || totalSize < frameSize) {
+if (frameSize == UNKNOWN_FRAME_SIZE) {
   return null;
 }
 
-// Reset size for next frame.
-nextFrameSize = UNKNOWN_FRAME_SIZE;
-
-Preconditions.checkArgument(frameSize < MAX_FRAME_SIZE, "Too large frame: 
%s", frameSize);
-Preconditions.checkArgument(frameSize > 0, "Frame length should be 
positive: %s", frameSize);
+if (frameBuf == null) {
+  Preconditions.checkArgument(frameSize < MAX_FRAME_SIZE,
+  "Too large frame: %s", frameSize);
+  Preconditions.checkArgument(frameSize > 0,
+  "Frame length should be positive: %s", frameSize);
+  frameRemainingBytes = (int) frameSize;
 
-// If the first buffer holds the entire frame, return it.
-int remaining = (int) frameSize;
-if (buffers.getFirst().readableBytes() >= remaining) {
-  return nextBufferForFrame(remaining);
+  // If buffers is empty, then return immediately for more input data.
+  if (buffers.isEmpty()) {
+return null;
+  }
+  // Otherwise, if the first buffer holds the entire frame, we attempt to
+  // build frame with it and return.
+  if (buffers.getFirst().readableBytes() >= frameRemainingBytes) {
+// Reset buf and size for next frame.
+frameBuf = null;
+nextFrameSize = UNKNOWN_FRAME_SIZE;
+return nextBufferForFrame(frameRemainingBytes);
+  }
+  // Other cases, create a composite buffer to manage all the buffers.
+  frameBuf = buffers.getFirst().alloc().compositeBuffer(Integer.MAX_VALUE);
 }
 
-// Otherwise, create a composite buffer.
-CompositeByteBuf frame = 
buffers.getFirst().alloc().compositeBuffer(Integer.MAX_VALUE);
-while (remaining > 0) {
-  ByteBuf next = nextBufferForFrame(remaining);
-  remaining -= next.readableBytes();
-  frame.addComponent(next).writerIndex(frame.writerIndex() + 
next.readableBytes());
+while (frameRemainingBytes > 0 && !buffers.isEmpty()) {
+  ByteBuf next = nextBufferForFrame(frameRemainingBytes);
+  frameRemainingBytes -= next.readableBytes();
+  frameBuf.addComponent(true, next);
 }
-assert remaining == 0;
+// If the delta size of frameBuf exceeds the threshold, then we do 
consolidation
+// to reduce memory consumption.
+if (frameBuf.capacity() - consolidatedFrameBufSize > consolidateThreshold) 
{
+  int newNumComponents = frameBuf.numComponents() - 
consolidatedNumComponents;
+  frameBuf.consolidate(consolidatedNumComponents, newNumComponents);
+  consolidatedFrameBufSize = frameBuf.capacity();
+  consolidatedNumComponents = frameBuf.numComponents();
+}
+if (frameRemainingBytes > 0) {
+  return null;
+}
+
+// Reset buf and size for next frame.
+ByteBuf frame = frameBuf;
+frameBuf = null;
+nextFrameSize = UNKNOWN_FRAME_SIZE;
 
 Review comment:
   Yes, I can add some code to test multiple messages, and we just need to do 
the same check for consolidated buf capacity.  I think this is more result 
oriented.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] asfgit closed pull request #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2019-02-15 Thread GitBox
asfgit closed pull request #18339: [SPARK-21094][PYTHON] Add popen_kwargs to 
launch_gateway
URL: https://github.com/apache/spark/pull/18339
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] holdenk commented on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2019-02-15 Thread GitBox
holdenk commented on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to 
launch_gateway
URL: https://github.com/apache/spark/pull/18339#issuecomment-464273651
 
 
   Merged to master


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #18339: [SPARK-21094][PYTHON] Add 
popen_kwargs to launch_gateway
URL: https://github.com/apache/spark/pull/18339#issuecomment-464273374
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #18339: [SPARK-21094][PYTHON] Add 
popen_kwargs to launch_gateway
URL: https://github.com/apache/spark/pull/18339#issuecomment-464273377
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102407/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs 
to launch_gateway
URL: https://github.com/apache/spark/pull/18339#issuecomment-464273377
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102407/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA removed a comment on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2019-02-15 Thread GitBox
SparkQA removed a comment on issue #18339: [SPARK-21094][PYTHON] Add 
popen_kwargs to launch_gateway
URL: https://github.com/apache/spark/pull/18339#issuecomment-464270324
 
 
   **[Test build #102407 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102407/testReport)**
 for PR 18339 at commit 
[`ea267c6`](https://github.com/apache/spark/commit/ea267c68c805951c5ee2fb4fccd9f8fb4a288297).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs 
to launch_gateway
URL: https://github.com/apache/spark/pull/18339#issuecomment-464273374
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2019-02-15 Thread GitBox
SparkQA commented on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to 
launch_gateway
URL: https://github.com/apache/spark/pull/18339#issuecomment-464273285
 
 
   **[Test build #102407 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102407/testReport)**
 for PR 18339 at commit 
[`ea267c6`](https://github.com/apache/spark/commit/ea267c68c805951c5ee2fb4fccd9f8fb4a288297).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] holdenk commented on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2019-02-15 Thread GitBox
holdenk commented on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to 
launch_gateway
URL: https://github.com/apache/spark/pull/18339#issuecomment-464273171
 
 
   Looks like Jenkins listened, everything passed so will merge to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] edwinalu commented on a change in pull request #23767: [SPARK-26329][CORE][WIP] Faster polling of executor memory metrics.

2019-02-15 Thread GitBox
edwinalu commented on a change in pull request #23767: [SPARK-26329][CORE][WIP] 
Faster polling of executor memory metrics.
URL: https://github.com/apache/spark/pull/23767#discussion_r257440553
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/SparkContext.scala
 ##
 @@ -2380,10 +2381,14 @@ class SparkContext(config: SparkConf) extends Logging {
 
   /** Reports heartbeat metrics for the driver. */
   private def reportHeartBeat(): Unit = {
-val driverUpdates = _heartbeater.getCurrentMetrics()
 
 Review comment:
   Would it be useful to poll more frequently for driver metrics as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] edwinalu commented on a change in pull request #23767: [SPARK-26329][CORE][WIP] Faster polling of executor memory metrics.

2019-02-15 Thread GitBox
edwinalu commented on a change in pull request #23767: [SPARK-26329][CORE][WIP] 
Faster polling of executor memory metrics.
URL: https://github.com/apache/spark/pull/23767#discussion_r257440322
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/executor/Executor.scala
 ##
 @@ -840,8 +952,25 @@ private[spark] class Executor(
 val accumUpdates = new ArrayBuffer[(Long, Seq[AccumulatorV2[_, _]])]()
 val curGCTime = computeTotalGcTime()
 
-// get executor level memory metrics
-val executorUpdates = heartbeater.getCurrentMetrics()
+// if not polling in a separater poller, poll here
+if (poller == null) {
+  poll()
+}
+
+// build the executor level memory metrics
+val executorUpdates = new HashMap[StageKey, ExecutorMetrics]
+
+def peaksForStage(k: StageKey, v: AtomicLong): (StageKey, AtomicLongArray) 
=
+  if (v.get() > 0) (k, stageMetricPeaks.get(k)) else null
+
+def addPeaks(nested: (StageKey, AtomicLongArray)): Unit = {
+  val (k, v) = nested
+  executorUpdates.put(k, new ExecutorMetrics(v))
+  // at the same time, reset the peaks in stageMetricPeaks
+  stageMetricPeaks.put(k, new 
AtomicLongArray(ExecutorMetricType.numMetrics))
+}
+
+activeStages.forEach[(StageKey, AtomicLongArray)](LONG_MAX_VALUE, 
peaksForStage, addPeaks)
 
 Review comment:
   There's the corner case where if the task fails, then metrics may not get 
sent. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] edwinalu commented on a change in pull request #23767: [SPARK-26329][CORE][WIP] Faster polling of executor memory metrics.

2019-02-15 Thread GitBox
edwinalu commented on a change in pull request #23767: [SPARK-26329][CORE][WIP] 
Faster polling of executor memory metrics.
URL: https://github.com/apache/spark/pull/23767#discussion_r257440251
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/SparkContext.scala
 ##
 @@ -2380,10 +2381,14 @@ class SparkContext(config: SparkConf) extends Logging {
 
   /** Reports heartbeat metrics for the driver. */
   private def reportHeartBeat(): Unit = {
-val driverUpdates = _heartbeater.getCurrentMetrics()
+val currentMetrics = ExecutorMetrics.getCurrentMetrics(env.memoryManager)
+val driverUpdates = new HashMap[(Int, Int), ExecutorMetrics]
+// In the driver, we do not track per-stage metrics, so use a dummy stage
+// for the key
+driverUpdates.put((-1, -1), new ExecutorMetrics(currentMetrics))
 
 Review comment:
   Yes, in onExecutorMetricsUpdate the stage information is added, so not 
needed here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] holdenk commented on a change in pull request #23795: [SPARK-26887][SQL][PYTHON] Create datetime.date directly instead of creating datetime64[ns] as intermediate data.

2019-02-15 Thread GitBox
holdenk commented on a change in pull request #23795: 
[SPARK-26887][SQL][PYTHON] Create datetime.date directly instead of creating 
datetime64[ns] as intermediate data.
URL: https://github.com/apache/spark/pull/23795#discussion_r257439925
 
 

 ##
 File path: python/pyspark/sql/types.py
 ##
 @@ -1681,38 +1681,53 @@ def from_arrow_schema(arrow_schema):
  for field in arrow_schema])
 
 
-def _check_series_convert_date(series, data_type):
-"""
-Cast the series to datetime.date if it's a date type, otherwise returns 
the original series.
+def _arrow_column_to_pandas(column, data_type):
+""" Convert Arrow Column to pandas Series.
+
+If the given column is a date type column, creates a series of 
datetime.date directly instead
+of creating datetime64[ns] as intermediate data.
 
-:param series: pandas.Series
-:param data_type: a Spark data type for the series
+:param series: pyarrow.lib.Column
+:param data_type: a Spark data type for the column
 """
-import pyarrow
+import pandas as pd
+import pyarrow as pa
 from distutils.version import LooseVersion
-# As of Arrow 0.12.0, date_as_objects is True by default, see ARROW-3910
-if LooseVersion(pyarrow.__version__) < LooseVersion("0.12.0") and 
type(data_type) == DateType:
-return series.dt.date
+# Since Arrow 0.11.0, support date_as_object to return datetime.date 
instead of np.datetime64.
 
 Review comment:
   Include a comment about the overflow here so we know why we are avoiding 
`np.datetime64`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #19045: [WIP][SPARK-20628][CORE][K8S] 
Keep track of nodes (/ spot instances) which are going to be shutdown
URL: https://github.com/apache/spark/pull/19045#issuecomment-464271644
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102403/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] holdenk commented on a change in pull request #23795: [SPARK-26887][SQL][PYTHON] Create datetime.date directly instead of creating datetime64[ns] as intermediate data.

2019-02-15 Thread GitBox
holdenk commented on a change in pull request #23795: 
[SPARK-26887][SQL][PYTHON] Create datetime.date directly instead of creating 
datetime64[ns] as intermediate data.
URL: https://github.com/apache/spark/pull/23795#discussion_r257439790
 
 

 ##
 File path: python/pyspark/sql/types.py
 ##
 @@ -1681,38 +1681,53 @@ def from_arrow_schema(arrow_schema):
  for field in arrow_schema])
 
 
-def _check_series_convert_date(series, data_type):
-"""
-Cast the series to datetime.date if it's a date type, otherwise returns 
the original series.
+def _arrow_column_to_pandas(column, data_type):
+""" Convert Arrow Column to pandas Series.
+
+If the given column is a date type column, creates a series of 
datetime.date directly instead
+of creating datetime64[ns] as intermediate data.
 
 Review comment:
   minor: I think these details belong as a comment internally rather than in 
the doc string.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #19045: [WIP][SPARK-20628][CORE][K8S] 
Keep track of nodes (/ spot instances) which are going to be shutdown
URL: https://github.com/apache/spark/pull/19045#issuecomment-464271641
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #17968: [SPARK-9792] Make DenseMatrix equality semantical

2019-02-15 Thread GitBox
SparkQA commented on issue #17968: [SPARK-9792] Make DenseMatrix equality 
semantical
URL: https://github.com/apache/spark/pull/17968#issuecomment-464271730
 
 
   **[Test build #102408 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102408/testReport)**
 for PR 17968 at commit 
[`311c94a`](https://github.com/apache/spark/commit/311c94a3d608b0b86f3ce39415639ec260e5af37).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep 
track of nodes (/ spot instances) which are going to be shutdown
URL: https://github.com/apache/spark/pull/19045#issuecomment-464271641
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #17968: [SPARK-9792] Make DenseMatrix equality semantical

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #17968: [SPARK-9792] Make DenseMatrix 
equality semantical
URL: https://github.com/apache/spark/pull/17968#issuecomment-464271531
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7988/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #17968: [SPARK-9792] Make DenseMatrix equality semantical

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #17968: [SPARK-9792] Make DenseMatrix equality 
semantical
URL: https://github.com/apache/spark/pull/17968#issuecomment-464271529
 
 
   Build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep 
track of nodes (/ spot instances) which are going to be shutdown
URL: https://github.com/apache/spark/pull/19045#issuecomment-464271644
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102403/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #17968: [SPARK-9792] Make DenseMatrix equality semantical

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #17968: [SPARK-9792] Make DenseMatrix 
equality semantical
URL: https://github.com/apache/spark/pull/17968#issuecomment-464271529
 
 
   Build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #17968: [SPARK-9792] Make DenseMatrix equality semantical

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #17968: [SPARK-9792] Make DenseMatrix equality 
semantical
URL: https://github.com/apache/spark/pull/17968#issuecomment-464271531
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7988/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA removed a comment on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown

2019-02-15 Thread GitBox
SparkQA removed a comment on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep 
track of nodes (/ spot instances) which are going to be shutdown
URL: https://github.com/apache/spark/pull/19045#issuecomment-464192922
 
 
   **[Test build #102403 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102403/testReport)**
 for PR 19045 at commit 
[`46b5725`](https://github.com/apache/spark/commit/46b5725f763e1858704c408b7a55f49f717790b0).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] wypoon commented on a change in pull request #23767: [SPARK-26329][CORE][WIP] Faster polling of executor memory metrics.

2019-02-15 Thread GitBox
wypoon commented on a change in pull request #23767: [SPARK-26329][CORE][WIP] 
Faster polling of executor memory metrics.
URL: https://github.com/apache/spark/pull/23767#discussion_r257439819
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##
 @@ -40,7 +41,7 @@ private[spark] case class Heartbeat(
 executorId: String,
 accumUpdates: Array[(Long, Seq[AccumulatorV2[_, _]])], // taskId -> 
accumulator updates
 blockManagerId: BlockManagerId,
-executorUpdates: ExecutorMetrics) // executor level updates
+executorUpdates: Map[(Int, Int), ExecutorMetrics]) // executor level 
updates
 
 Review comment:
   Sure, will do.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown

2019-02-15 Thread GitBox
SparkQA commented on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep track of 
nodes (/ spot instances) which are going to be shutdown
URL: https://github.com/apache/spark/pull/19045#issuecomment-464271447
 
 
   **[Test build #102403 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102403/testReport)**
 for PR 19045 at commit 
[`46b5725`](https://github.com/apache/spark/commit/46b5725f763e1858704c408b7a55f49f717790b0).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #17968: [SPARK-9792] Make DenseMatrix equality semantical

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #17968: [SPARK-9792] Make DenseMatrix 
equality semantical
URL: https://github.com/apache/spark/pull/17968#issuecomment-453656628
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] holdenk commented on issue #17968: [SPARK-9792] Make DenseMatrix equality semantical

2019-02-15 Thread GitBox
holdenk commented on issue #17968: [SPARK-9792] Make DenseMatrix equality 
semantical
URL: https://github.com/apache/spark/pull/17968#issuecomment-464271175
 
 
   jenkins ok to test 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] holdenk commented on issue #17968: [SPARK-9792] Make DenseMatrix equality semantical

2019-02-15 Thread GitBox
holdenk commented on issue #17968: [SPARK-9792] Make DenseMatrix equality 
semantical
URL: https://github.com/apache/spark/pull/17968#issuecomment-464271163
 
 
   Jenkins OK to test
   Are you still actively wortking on this and if so would you update it to 
master?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] holdenk commented on issue #20028: [SPARK-19053][ML]Supporting multiple evaluation metrics in DataFrame-based API

2019-02-15 Thread GitBox
holdenk commented on issue #20028: [SPARK-19053][ML]Supporting multiple 
evaluation metrics in DataFrame-based API
URL: https://github.com/apache/spark/pull/20028#issuecomment-464271070
 
 
   Is this still being actively worked on?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23792: [SPARK-26882] Check the Kubernetes integration tests scalatyle

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #23792: [SPARK-26882] Check the 
Kubernetes integration tests scalatyle
URL: https://github.com/apache/spark/pull/23792#issuecomment-464270877
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102402/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] holdenk commented on issue #23793: [SPARK-24736][k8s] Let spark-submit handle dependency resolution.

2019-02-15 Thread GitBox
holdenk commented on issue #23793: [SPARK-24736][k8s] Let spark-submit handle 
dependency resolution.
URL: https://github.com/apache/spark/pull/23793#issuecomment-464270862
 
 
   So I'm a little confused here since if we look at the YARN cluster manager 
we also see similar logic around setting the PYTHONPATH
   
   Have you tested this with a zipfile or egg as a dependency since I don't 
think Python will by default expand all zip files in pwd?
   
   cc @ifilonenko 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23792: [SPARK-26882] Check the Kubernetes integration tests scalatyle

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #23792: [SPARK-26882] Check the 
Kubernetes integration tests scalatyle
URL: https://github.com/apache/spark/pull/23792#issuecomment-464270873
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23792: [SPARK-26882] Check the Kubernetes integration tests scalatyle

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #23792: [SPARK-26882] Check the Kubernetes 
integration tests scalatyle
URL: https://github.com/apache/spark/pull/23792#issuecomment-464270877
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102402/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23792: [SPARK-26882] Check the Kubernetes integration tests scalatyle

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #23792: [SPARK-26882] Check the Kubernetes 
integration tests scalatyle
URL: https://github.com/apache/spark/pull/23792#issuecomment-464270873
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA removed a comment on issue #23792: [SPARK-26882] Check the Kubernetes integration tests scalatyle

2019-02-15 Thread GitBox
SparkQA removed a comment on issue #23792: [SPARK-26882] Check the Kubernetes 
integration tests scalatyle
URL: https://github.com/apache/spark/pull/23792#issuecomment-464192851
 
 
   **[Test build #102402 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102402/testReport)**
 for PR 23792 at commit 
[`c50f10d`](https://github.com/apache/spark/commit/c50f10d37d66af6fa60b561c0f139bbf558eccfd).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #23792: [SPARK-26882] Check the Kubernetes integration tests scalatyle

2019-02-15 Thread GitBox
SparkQA commented on issue #23792: [SPARK-26882] Check the Kubernetes 
integration tests scalatyle
URL: https://github.com/apache/spark/pull/23792#issuecomment-464270679
 
 
   **[Test build #102402 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102402/testReport)**
 for PR 23792 at commit 
[`c50f10d`](https://github.com/apache/spark/commit/c50f10d37d66af6fa60b561c0f139bbf558eccfd).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23807: [SPARK-26897][SQL][TEST] Update Spark 2.3.x testing from HiveExternalCatalogVersionsSuite

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #23807: [SPARK-26897][SQL][TEST] 
Update Spark 2.3.x testing from HiveExternalCatalogVersionsSuite
URL: https://github.com/apache/spark/pull/23807#issuecomment-464270406
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102406/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA removed a comment on issue #23807: [SPARK-26897][SQL][TEST] Update Spark 2.3.x testing from HiveExternalCatalogVersionsSuite

2019-02-15 Thread GitBox
SparkQA removed a comment on issue #23807: [SPARK-26897][SQL][TEST] Update 
Spark 2.3.x testing from HiveExternalCatalogVersionsSuite
URL: https://github.com/apache/spark/pull/23807#issuecomment-464256809
 
 
   **[Test build #102406 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102406/testReport)**
 for PR 23807 at commit 
[`799a01a`](https://github.com/apache/spark/commit/799a01ac76763549439e3dd32b9dfdd841d10313).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23807: [SPARK-26897][SQL][TEST] Update Spark 2.3.x testing from HiveExternalCatalogVersionsSuite

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #23807: [SPARK-26897][SQL][TEST] Update Spark 
2.3.x testing from HiveExternalCatalogVersionsSuite
URL: https://github.com/apache/spark/pull/23807#issuecomment-464270406
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102406/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23807: [SPARK-26897][SQL][TEST] Update Spark 2.3.x testing from HiveExternalCatalogVersionsSuite

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #23807: [SPARK-26897][SQL][TEST] Update Spark 
2.3.x testing from HiveExternalCatalogVersionsSuite
URL: https://github.com/apache/spark/pull/23807#issuecomment-464270404
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23807: [SPARK-26897][SQL][TEST] Update Spark 2.3.x testing from HiveExternalCatalogVersionsSuite

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #23807: [SPARK-26897][SQL][TEST] 
Update Spark 2.3.x testing from HiveExternalCatalogVersionsSuite
URL: https://github.com/apache/spark/pull/23807#issuecomment-464270404
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2019-02-15 Thread GitBox
SparkQA commented on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to 
launch_gateway
URL: https://github.com/apache/spark/pull/18339#issuecomment-464270324
 
 
   **[Test build #102407 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102407/testReport)**
 for PR 18339 at commit 
[`ea267c6`](https://github.com/apache/spark/commit/ea267c68c805951c5ee2fb4fccd9f8fb4a288297).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #18339: [SPARK-21094][PYTHON] Add 
popen_kwargs to launch_gateway
URL: https://github.com/apache/spark/pull/18339#issuecomment-464270146
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7987/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #23807: [SPARK-26897][SQL][TEST] Update Spark 2.3.x testing from HiveExternalCatalogVersionsSuite

2019-02-15 Thread GitBox
SparkQA commented on issue #23807: [SPARK-26897][SQL][TEST] Update Spark 2.3.x 
testing from HiveExternalCatalogVersionsSuite
URL: https://github.com/apache/spark/pull/23807#issuecomment-464270362
 
 
   **[Test build #102406 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102406/testReport)**
 for PR 23807 at commit 
[`799a01a`](https://github.com/apache/spark/commit/799a01ac76763549439e3dd32b9dfdd841d10313).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #18339: [SPARK-21094][PYTHON] Add 
popen_kwargs to launch_gateway
URL: https://github.com/apache/spark/pull/18339#issuecomment-464270142
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs 
to launch_gateway
URL: https://github.com/apache/spark/pull/18339#issuecomment-464270142
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs 
to launch_gateway
URL: https://github.com/apache/spark/pull/18339#issuecomment-464270146
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7987/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #19045: [WIP][SPARK-20628][CORE][K8S] 
Keep track of nodes (/ spot instances) which are going to be shutdown
URL: https://github.com/apache/spark/pull/19045#issuecomment-464269503
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] holdenk commented on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2019-02-15 Thread GitBox
holdenk commented on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to 
launch_gateway
URL: https://github.com/apache/spark/pull/18339#issuecomment-464269764
 
 
   Jenkins retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] holdenk commented on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2019-02-15 Thread GitBox
holdenk commented on issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to 
launch_gateway
URL: https://github.com/apache/spark/pull/18339#issuecomment-464269807
 
 
   @parente if you could merge in master that would trigger a Jenkins run.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown

2019-02-15 Thread GitBox
AmplabJenkins removed a comment on issue #19045: [WIP][SPARK-20628][CORE][K8S] 
Keep track of nodes (/ spot instances) which are going to be shutdown
URL: https://github.com/apache/spark/pull/19045#issuecomment-464269507
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102401/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] holdenk commented on a change in pull request #23741: [SPARK-22798][PYTHON][ML]Add multiple column support to PySpark StringIndexer

2019-02-15 Thread GitBox
holdenk commented on a change in pull request #23741: 
[SPARK-22798][PYTHON][ML]Add multiple column support to PySpark StringIndexer
URL: https://github.com/apache/spark/pull/23741#discussion_r257438402
 
 

 ##
 File path: python/pyspark/ml/wrapper.py
 ##
 @@ -87,9 +87,19 @@ def _new_java_array(pylist, java_class):
   - bool -> sc._gateway.jvm.java.lang.Boolean
 """
 
 Review comment:
   Just a gentle ping on doing this part


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep 
track of nodes (/ spot instances) which are going to be shutdown
URL: https://github.com/apache/spark/pull/19045#issuecomment-464269503
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] holdenk commented on a change in pull request #23741: [SPARK-22798][PYTHON][ML]Add multiple column support to PySpark StringIndexer

2019-02-15 Thread GitBox
holdenk commented on a change in pull request #23741: 
[SPARK-22798][PYTHON][ML]Add multiple column support to PySpark StringIndexer
URL: https://github.com/apache/spark/pull/23741#discussion_r257438503
 
 

 ##
 File path: python/pyspark/ml/wrapper.py
 ##
 @@ -87,9 +87,19 @@ def _new_java_array(pylist, java_class):
   - bool -> sc._gateway.jvm.java.lang.Boolean
 """
 sc = SparkContext._active_spark_context
-java_array = sc._gateway.new_array(java_class, len(pylist))
-for i in xrange(len(pylist)):
-java_array[i] = pylist[i]
+java_array = None
+if len(pylist) > 0 and isinstance(pylist[0], list):
+inner_array_length = 0
+for i in xrange(len(pylist)):
+inner_array_length = max(inner_array_length, len(pylist[i]))
+java_array = sc._gateway.new_array(java_class, len(pylist), 
inner_array_length)
 
 Review comment:
   I think we now have this in 
https://github.com/apache/spark/pull/23741/files#diff-898790f48e214f86080160b45fcf81cfR102


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown

2019-02-15 Thread GitBox
AmplabJenkins commented on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep 
track of nodes (/ spot instances) which are going to be shutdown
URL: https://github.com/apache/spark/pull/19045#issuecomment-464269507
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102401/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA removed a comment on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown

2019-02-15 Thread GitBox
SparkQA removed a comment on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep 
track of nodes (/ spot instances) which are going to be shutdown
URL: https://github.com/apache/spark/pull/19045#issuecomment-464189231
 
 
   **[Test build #102401 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102401/testReport)**
 for PR 19045 at commit 
[`25dc907`](https://github.com/apache/spark/commit/25dc90775a50cc462cd5f325c3b3eada5def1808).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >