[GitHub] [spark] AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 
or more grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-566426374
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more 
grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-566426374
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 
or more grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-566426376
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115425/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more 
grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-566426376
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115425/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2019-12-16 Thread GitBox
SparkQA commented on issue #26918: [SPARK-30279][SQL] Support 32 or more 
grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-566425781
 
 
   **[Test build #115425 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115425/testReport)**
 for PR 26918 at commit 
[`327967e`](https://github.com/apache/spark/commit/327967e22a105386bf43e0fac4e9fa74ec70bd4e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2019-12-16 Thread GitBox
SparkQA removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or 
more grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-566366273
 
 
   **[Test build #115425 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115425/testReport)**
 for PR 26918 at commit 
[`327967e`](https://github.com/apache/spark/commit/327967e22a105386bf43e0fac4e9fa74ec70bd4e).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] steven-aerts commented on a change in pull request #26907: [SPARK-30267][SQL] Avro arrays can be of any List

2019-12-16 Thread GitBox
steven-aerts commented on a change in pull request #26907: [SPARK-30267][SQL] 
Avro arrays can be of any List
URL: https://github.com/apache/spark/pull/26907#discussion_r358641861
 
 

 ##
 File path: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
 ##
 @@ -167,7 +167,7 @@ class AvroDeserializer(rootAvroType: Schema, 
rootCatalystType: DataType) {
   case (ARRAY, ArrayType(elementType, containsNull)) =>
 val elementWriter = newWriter(avroType.getElementType, elementType, 
path)
 (updater, ordinal, value) =>
-  val array = value.asInstanceOf[GenericData.Array[Any]]
+  val array = value.asInstanceOf[java.util.List[Any]]
 
 Review comment:
   Added a test for an array containing structs which was not yet covered.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-16 Thread GitBox
zhengruifeng commented on issue #26803: [SPARK-30178][ML] RobustScaler support 
large numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-566419509
 
 
   Moreover, for existing impl, we can use `reduceByKey`/`aggregateByKey` with 
feature index as a key to generate a `RDD[QuantileSummaries]` and then compute 
and fetch the range/median instead of the large `Array[QuantileSummaries]`, as 
@srowen suggested.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26921: [SPARK-30282][SQL] 
UnresolvedV2Relation should be resolved to temp view first
URL: https://github.com/apache/spark/pull/26921#issuecomment-566419176
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20240/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26921: [SPARK-30282][SQL] 
UnresolvedV2Relation should be resolved to temp view first
URL: https://github.com/apache/spark/pull/26921#issuecomment-566419171
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26921: [SPARK-30282][SQL] 
UnresolvedV2Relation should be resolved to temp view first
URL: https://github.com/apache/spark/pull/26921#issuecomment-566419171
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26921: [SPARK-30282][SQL] 
UnresolvedV2Relation should be resolved to temp view first
URL: https://github.com/apache/spark/pull/26921#issuecomment-566419176
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20240/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first

2019-12-16 Thread GitBox
SparkQA commented on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation 
should be resolved to temp view first
URL: https://github.com/apache/spark/pull/26921#issuecomment-566418536
 
 
   **[Test build #115437 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115437/testReport)**
 for PR 26921 at commit 
[`db311fd`](https://github.com/apache/spark/commit/db311fd3dc95ac79413ced64d94d3caee97e423b).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26811: [SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType

2019-12-16 Thread GitBox
cloud-fan commented on a change in pull request #26811: [SPARK-29600][SQL] 
ArrayContains function may return incorrect result for DecimalType
URL: https://github.com/apache/spark/pull/26811#discussion_r358633880
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
 ##
 @@ -850,7 +850,7 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSparkSession {
 val errorMsg1 =
   s"""
  |Input to function array_contains should have been array followed by a
- |value with same element type, but it's [array, decimal(29,29)].
+ |value with same element type, but it's [array, decimal(38,29)].
 
 Review comment:
   Yea I get that we can't do cast here. My question is: since we can't do 
cast, we should leave the expression un-touched. But now we add cast to one 
side and leave the expression unresolved. Where do we add that useless cast?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26811: [SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType

2019-12-16 Thread GitBox
cloud-fan commented on a change in pull request #26811: [SPARK-29600][SQL] 
ArrayContains function may return incorrect result for DecimalType
URL: https://github.com/apache/spark/pull/26811#discussion_r358633880
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
 ##
 @@ -850,7 +850,7 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSparkSession {
 val errorMsg1 =
   s"""
  |Input to function array_contains should have been array followed by a
- |value with same element type, but it's [array, decimal(29,29)].
+ |value with same element type, but it's [array, decimal(38,29)].
 
 Review comment:
   Yea I get that we can't do cast here. My question is: since we can't do 
cast, we should leave the expression un-touched. But now we add cast to one 
side and leave the expression unresolved. Where do we add that uesless cast?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-16 Thread GitBox
zhengruifeng commented on issue #26803: [SPARK-30178][ML] RobustScaler support 
large numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-566414715
 
 
   test code:
   ```scala
   import org.apache.spark.ml.linalg._
   import org.apache.spark.ml.feature._
   import org.apache.spark.storage.StorageLevel
   
   val rdd = sc.range(0, 1000, 1, 100)
   val df = rdd.map(i => Tuple1.apply(Vectors.dense((i % 1000).toDouble / 
1000))).toDF("features")
   df.persist(StorageLevel.MEMORY_AND_DISK)
   df.count
   
   
   val scaler = new RobustScaler().setInputCol("features")
   
   val start = System.currentTimeMillis; Seq.range(0, 100).foreach{_ => val 
model = scaler.fit(df)}; val end = System.currentTimeMillis
   
   end - start
   ```
   
   Master: 243493
   This PR: 285341
   I test an edge case with only numFeatures=1, and existing impl is about 17% 
faster then this PR.
   
   That is to say this PR will support medium/large (>1000) numFeatures at the 
cost of some performance regression on low-dim cases.
   Or we check the numFeatures at first, and decide which method to use?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26921: [SPARK-30282][SQL] 
UnresolvedV2Relation should be resolved to temp view first
URL: https://github.com/apache/spark/pull/26921#issuecomment-566413592
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20239/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26921: [SPARK-30282][SQL] 
UnresolvedV2Relation should be resolved to temp view first
URL: https://github.com/apache/spark/pull/26921#issuecomment-566413585
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26921: [SPARK-30282][SQL] 
UnresolvedV2Relation should be resolved to temp view first
URL: https://github.com/apache/spark/pull/26921#issuecomment-566413585
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26921: [SPARK-30282][SQL] 
UnresolvedV2Relation should be resolved to temp view first
URL: https://github.com/apache/spark/pull/26921#issuecomment-566413592
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20239/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first

2019-12-16 Thread GitBox
SparkQA commented on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation 
should be resolved to temp view first
URL: https://github.com/apache/spark/pull/26921#issuecomment-566412969
 
 
   **[Test build #115436 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115436/testReport)**
 for PR 26921 at commit 
[`5d8422d`](https://github.com/apache/spark/commit/5d8422d4dc09832c81117af6e7c12f67c4b04ad4).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 opened a new pull request #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first

2019-12-16 Thread GitBox
imback82 opened a new pull request #26921: [SPARK-30282][SQL] 
UnresolvedV2Relation should be resolved to temp view first
URL: https://github.com/apache/spark/pull/26921
 
 
   
   
   ### What changes were proposed in this pull request?
   
   This is a part of effort to make the relation lookup behavior consistent: 
[SPARK-2990](https://issues.apache.org/jira/browse/SPARK-29900).
   
   This PR specifically addresses the V2 commands whose logical plan contains 
`UnresolvedV2Relation` such that if `UnresolvedV2Relation` is resolved to a 
temp view, those commands should error out with a message that v2 command 
cannot handle temp views.
   ### Why are the changes needed?
   
   For the following v2 commands, `Analyzer.ResolveTables` does not check 
against the temp views before resolving `UnresolvedV2Relation`, thus it always 
resolves `UnresolvedV2Relation` to a table:
   ```
   ALTER TABLE
   DESCRIBE TABLE
   SHOW TBLPROPERTIES
   ```
   Thus, in the following example, `t` will be resolved to a table, not a temp 
view:
   ```
   sql("CREATE TEMPORARY VIEW t AS SELECT 2 AS i")
   sql("CREATE TABLE testcat.ns.t USING csv AS SELECT 1 AS i")
   sql("USE testcat.ns")
   sql("DESCRIBE t") // 't' is resolved to a table
   ```
   This behavior is inconsistent with other commands which look up temp views 
first.
   
   ### Does this PR introduce any user-facing change?
   
   Yes, now the above example will fail as follows:
   ```
   sql("DESCRIBE t") // 't' is now resolved to a temp view
   org.apache.spark.sql.AnalysisException: A temp view 't' cannot be handled by 
V2 commands.;
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTables$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveTables$$resolveV2Relation(Analyzer.scala:782)
   ```
   
   ### How was this patch tested?
   
   Added new tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26905: [SPARK-30266][SQL] Avoid 
overflow and match error in ApproximatePercentile
URL: https://github.com/apache/spark/pull/26905#issuecomment-566411082
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26905: [SPARK-30266][SQL] Avoid 
overflow and match error in ApproximatePercentile
URL: https://github.com/apache/spark/pull/26905#issuecomment-566411091
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20238/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25452: [SPARK-28710][SQL]to fix replace function, spark should call drop and create function

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #25452: [SPARK-28710][SQL]to fix 
replace function, spark should call drop and create function
URL: https://github.com/apache/spark/pull/25452#issuecomment-566410838
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115427/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25452: [SPARK-28710][SQL]to fix replace function, spark should call drop and create function

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #25452: [SPARK-28710][SQL]to fix 
replace function, spark should call drop and create function
URL: https://github.com/apache/spark/pull/25452#issuecomment-566410827
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26905: [SPARK-30266][SQL] Avoid overflow and 
match error in ApproximatePercentile
URL: https://github.com/apache/spark/pull/26905#issuecomment-566411082
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26905: [SPARK-30266][SQL] Avoid overflow and 
match error in ApproximatePercentile
URL: https://github.com/apache/spark/pull/26905#issuecomment-566411091
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20238/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25452: [SPARK-28710][SQL]to fix replace function, spark should call drop and create function

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #25452: [SPARK-28710][SQL]to fix replace 
function, spark should call drop and create function
URL: https://github.com/apache/spark/pull/25452#issuecomment-566410827
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25452: [SPARK-28710][SQL]to fix replace function, spark should call drop and create function

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #25452: [SPARK-28710][SQL]to fix replace 
function, spark should call drop and create function
URL: https://github.com/apache/spark/pull/25452#issuecomment-566410838
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115427/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile

2019-12-16 Thread GitBox
SparkQA commented on issue #26905: [SPARK-30266][SQL] Avoid overflow and match 
error in ApproximatePercentile
URL: https://github.com/apache/spark/pull/26905#issuecomment-566410649
 
 
   **[Test build #115435 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115435/testReport)**
 for PR 26905 at commit 
[`2bcea50`](https://github.com/apache/spark/commit/2bcea5066cd3d2db04f62980517ca6c14164bd7c).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25452: [SPARK-28710][SQL]to fix replace function, spark should call drop and create function

2019-12-16 Thread GitBox
SparkQA commented on issue #25452: [SPARK-28710][SQL]to fix replace function, 
spark should call drop and create function
URL: https://github.com/apache/spark/pull/25452#issuecomment-566410409
 
 
   **[Test build #115427 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115427/testReport)**
 for PR 25452 at commit 
[`95d1e23`](https://github.com/apache/spark/commit/95d1e2391b3c5d8daeb97c41e5eb9a13b73f3743).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #25452: [SPARK-28710][SQL]to fix replace function, spark should call drop and create function

2019-12-16 Thread GitBox
SparkQA removed a comment on issue #25452: [SPARK-28710][SQL]to fix replace 
function, spark should call drop and create function
URL: https://github.com/apache/spark/pull/25452#issuecomment-566379151
 
 
   **[Test build #115427 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115427/testReport)**
 for PR 25452 at commit 
[`95d1e23`](https://github.com/apache/spark/commit/95d1e2391b3c5d8daeb97c41e5eb9a13b73f3743).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wypoon commented on a change in pull request #26895: [SPARK-17398][SQL] Fix ClassCastException when querying partitioned JSON table

2019-12-16 Thread GitBox
wypoon commented on a change in pull request #26895: [SPARK-17398][SQL] Fix 
ClassCastException when querying partitioned JSON table
URL: https://github.com/apache/spark/pull/26895#discussion_r358628516
 
 

 ##
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
 ##
 @@ -252,10 +254,14 @@ class HadoopTableReader(
 partProps.asScala.foreach {
   case (key, value) => props.setProperty(key, value)
 }
-deserializer.initialize(hconf, props)
+DeserializerLock.synchronized {
+  deserializer.initialize(hconf, props)
+}
 // get the table deserializer
 val tableSerDe = 
localTableDesc.getDeserializerClass.getConstructor().newInstance()
-tableSerDe.initialize(hconf, localTableDesc.getProperties)
+DeserializerLock.synchronized {
+  tableSerDe.initialize(hconf, tableProperties)
 
 Review comment:
   Yes, I did find that to be the case in my repro, that the two were the same 
class (JsonSerDe). However, the initialize calls on deserializer and on 
tableSerDe are with potentially different properties (props and tableProperties 
could differ), so I think I should initialize tableSerDe even if it has the 
same class as deserializer.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-16 Thread GitBox
yaooqinn commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type 
+/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#issuecomment-566410341
 
 
   Hi @gengliangwang, thanks for your suggestion, I have updated the 
description. Can you check whether it is clear enough.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26920: [SPARK-30281][SS] Consider 
partitioned/recursive option while verifying archive path on FileStreamSource
URL: https://github.com/apache/spark/pull/26920#issuecomment-566408882
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26920: [SPARK-30281][SS] Consider 
partitioned/recursive option while verifying archive path on FileStreamSource
URL: https://github.com/apache/spark/pull/26920#issuecomment-566408889
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20237/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26920: [SPARK-30281][SS] Consider 
partitioned/recursive option while verifying archive path on FileStreamSource
URL: https://github.com/apache/spark/pull/26920#issuecomment-566408889
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20237/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26920: [SPARK-30281][SS] Consider 
partitioned/recursive option while verifying archive path on FileStreamSource
URL: https://github.com/apache/spark/pull/26920#issuecomment-566408882
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol

2019-12-16 Thread GitBox
SparkQA commented on issue #26869: [SPARK-30235][CORE] Switching off host local 
disk reading of shuffle blocks in case of useOldFetchProtocol
URL: https://github.com/apache/spark/pull/26869#issuecomment-566408400
 
 
   **[Test build #115434 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115434/testReport)**
 for PR 26869 at commit 
[`6258ccd`](https://github.com/apache/spark/commit/6258ccd153fec744968b6a65f6f890dbfe013ad2).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource

2019-12-16 Thread GitBox
SparkQA commented on issue #26920: [SPARK-30281][SS] Consider 
partitioned/recursive option while verifying archive path on FileStreamSource
URL: https://github.com/apache/spark/pull/26920#issuecomment-566408383
 
 
   **[Test build #115433 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115433/testReport)**
 for PR 26920 at commit 
[`e779edc`](https://github.com/apache/spark/commit/e779edcd3edec3608456d41caca98f5f7e5884a7).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] JkSelf commented on a change in pull request #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2019-12-16 Thread GitBox
JkSelf commented on a change in pull request #26434: [SPARK-29544] [SQL] 
optimize skewed partition based on data size
URL: https://github.com/apache/spark/pull/26434#discussion_r358626209
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ReduceNumShufflePartitionsSuite.scala
 ##
 @@ -132,7 +133,8 @@ class ReduceNumShufflePartitionsSuite extends 
SparkFunSuite with BeforeAndAfterA
 Array(
   new MapOutputStatistics(0, bytesByPartitionId1),
   new MapOutputStatistics(1, bytesByPartitionId2))
-  
intercept[AssertionError](rule.estimatePartitionStartIndices(mapOutputStatistics))
+ intercept[AssertionError](rule.estimatePartitionStartAndEndIndices(
+   mapOutputStatistics, (0 until bytesByPartitionId1.length).toSet))
 
 Review comment:
   yes, we can delete it now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] JkSelf commented on a change in pull request #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2019-12-16 Thread GitBox
JkSelf commented on a change in pull request #26434: [SPARK-29544] [SQL] 
optimize skewed partition based on data size
URL: https://github.com/apache/spark/pull/26434#discussion_r358626105
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ReduceNumShufflePartitionsSuite.scala
 ##
 @@ -59,8 +59,9 @@ class ReduceNumShufflePartitionsSuite extends SparkFunSuite 
with BeforeAndAfterA
   case (bytesByPartitionId, index) =>
 new MapOutputStatistics(index, bytesByPartitionId)
 }
+val length = mapOutputStatistics.map(_.bytesByPartitionId.length).head
 val estimatedPartitionStartIndices =
-  rule.estimatePartitionStartIndices(mapOutputStatistics)
+  rule.estimatePartitionStartAndEndIndices(mapOutputStatistics).unzip._1
 
 Review comment:
   Do you mean we need check there is excluded partitions? Current the check is 
already no excluded partitions. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26869: [SPARK-30235][CORE] Switching 
off host local disk reading of shuffle blocks in case of useOldFetchProtocol
URL: https://github.com/apache/spark/pull/26869#issuecomment-566406639
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26920: [SPARK-30281][SS] Consider 
partitioned/recursive option while verifying archive path on FileStreamSource
URL: https://github.com/apache/spark/pull/26920#issuecomment-566406677
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26869: [SPARK-30235][CORE] Switching off host 
local disk reading of shuffle blocks in case of useOldFetchProtocol
URL: https://github.com/apache/spark/pull/26869#issuecomment-566406649
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20236/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26869: [SPARK-30235][CORE] Switching 
off host local disk reading of shuffle blocks in case of useOldFetchProtocol
URL: https://github.com/apache/spark/pull/26869#issuecomment-566406649
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20236/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26920: [SPARK-30281][SS] Consider 
partitioned/recursive option while verifying archive path on FileStreamSource
URL: https://github.com/apache/spark/pull/26920#issuecomment-566406685
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20235/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26920: [SPARK-30281][SS] Consider 
partitioned/recursive option while verifying archive path on FileStreamSource
URL: https://github.com/apache/spark/pull/26920#issuecomment-566406685
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20235/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26920: [SPARK-30281][SS] Consider 
partitioned/recursive option while verifying archive path on FileStreamSource
URL: https://github.com/apache/spark/pull/26920#issuecomment-566406677
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26869: [SPARK-30235][CORE] Switching off host 
local disk reading of shuffle blocks in case of useOldFetchProtocol
URL: https://github.com/apache/spark/pull/26869#issuecomment-566406639
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource

2019-12-16 Thread GitBox
SparkQA commented on issue #26920: [SPARK-30281][SS] Consider 
partitioned/recursive option while verifying archive path on FileStreamSource
URL: https://github.com/apache/spark/pull/26920#issuecomment-566406291
 
 
   **[Test build #115432 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115432/testReport)**
 for PR 26920 at commit 
[`4b99e61`](https://github.com/apache/spark/commit/4b99e612e994073e995a337a94db123193af2b5f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #22952: [SPARK-20568][SS] Provide option to clean up completed files in streaming query

2019-12-16 Thread GitBox
HeartSaVioR commented on a change in pull request #22952: [SPARK-20568][SS] 
Provide option to clean up completed files in streaming query
URL: https://github.com/apache/spark/pull/22952#discussion_r358624511
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
 ##
 @@ -330,4 +341,96 @@ object FileStreamSource {
 
 def size: Int = map.size()
   }
+
+  private[sql] trait FileStreamSourceCleaner {
+def clean(entry: FileEntry): Unit
+  }
+
+  private[sql] object FileStreamSourceCleaner {
+def apply(
+fileSystem: FileSystem,
+sourcePath: Path,
+option: FileStreamOptions,
+hadoopConf: Configuration): Option[FileStreamSourceCleaner] = 
option.cleanSource match {
+  case CleanSourceMode.ARCHIVE =>
+require(option.sourceArchiveDir.isDefined)
+val path = new Path(option.sourceArchiveDir.get)
+val archiveFs = path.getFileSystem(hadoopConf)
+val qualifiedArchivePath = archiveFs.makeQualified(path)
+Some(new SourceFileArchiver(fileSystem, sourcePath, archiveFs, 
qualifiedArchivePath))
+
+  case CleanSourceMode.DELETE =>
+Some(new SourceFileRemover(fileSystem))
+
+  case _ => None
+}
+  }
+
+  private[sql] class SourceFileArchiver(
+  fileSystem: FileSystem,
+  sourcePath: Path,
+  baseArchiveFileSystem: FileSystem,
+  baseArchivePath: Path) extends FileStreamSourceCleaner with Logging {
+assertParameters()
+
+private def assertParameters(): Unit = {
+  require(fileSystem.getUri == baseArchiveFileSystem.getUri, "Base archive 
path is located " +
+s"on a different file system than the source files. source path: 
$sourcePath" +
+s" / base archive path: $baseArchivePath")
+
+  /**
+   * FileStreamSource reads the files which one of below conditions is met:
+   * 1) file itself is matched with source path
+   * 2) parent directory is matched with source path
 
 Review comment:
   FYI, just filed https://issues.apache.org/jira/browse/SPARK-30281 and raised 
a patch with picking the option 2. #26845


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR opened a new pull request #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource

2019-12-16 Thread GitBox
HeartSaVioR opened a new pull request #26920: [SPARK-30281][SS] Consider 
partitioned/recursive option while verifying archive path on FileStreamSource
URL: https://github.com/apache/spark/pull/26920
 
 
   ### What changes were proposed in this pull request?
   
   This patch renews the verification logic of archive path for 
FileStreamSource, as we found the logic doesn't take partitioned/recursive 
options into account.
   
   Before the patch, it only requires the archive path to have depth more than 
2 (two subdirectories from root), leveraging the fact FileStreamSource normally 
reads the files where the parent directory matches the pattern or the file 
itself matches the pattern. Given 'archive' operation moves the files to the 
base archive path with retaining the full path, archive path is tend to be safe 
if the depth is more than 2, meaning FileStreamSource doesn't re-read archived 
files as new source files.
   
   WIth partitioned/recursive options, the fact is invalid, as FileStreamSource 
can read any files in any depth of subdirectories for source pattern. To deal 
with this correctly, we have to renew the verification logic, which may not 
intuitive and simple but works for all cases.
   
   The new verification logic prevents both cases:
   
   1) archive path matches with source pattern as "prefix" (the depth of 
archive path > the depth of source pattern)
   
   e.g.
   * source pattern: `/hello*/spar?`
   * archive path: `/hello/spark/structured/streaming`
   
   Any files in archive path will match with source pattern when recursive 
option is enabled.
   
   2) source pattern matches with archive path as "prefix" (the depth of source 
pattern > the depth of archive path)
   
   e.g.
   * source pattern: `/hello*/spar?/structured/hello2*`
   * archive path: `/hello/spark/structured`
   
   Some archive files will not match with source pattern, e.g. file path:  
`/hello/spark/structured/hello2`, then final archived path: 
`/hello/spark/structured/hello/spark/structured/hello2`.
   
   But some other archive files will still match with source pattern, e.g. file 
path: `/hello2/spark/structured/hello2`, then final archived path: 
`/hello/spark/structured/hello2/spark/structured/hello2` which matches with 
source pattern when recursive is enabled.
   
   Implicitly it also prevents archive path matches with source pattern as full 
match (same depth).
   
   We would want to prevent any source files to be archived and added to new 
source files again, so the patch takes most restrictive approach to prevent the 
possible cases.
   
   ### Why are the changes needed?
   
   Without this patch, there's a chance archived files are included as new 
source files when partitioned/recursive option is enabled, as current condition 
doesn't take these options into account.
   
   ### Does this PR introduce any user-facing change?
   
   Only for Spark 3.0.0-preview 1 - end users are required to provide archive 
path with ensuring a bit complicated conditions, instead of simply higher than 
2 depths.
   
   ### How was this patch tested?
   
   New UT.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26656: [SPARK-27986][SQL] Support 
ANSI SQL filter clause for aggregate expression
URL: https://github.com/apache/spark/pull/26656#issuecomment-566404516
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26656: [SPARK-27986][SQL] Support 
ANSI SQL filter clause for aggregate expression
URL: https://github.com/apache/spark/pull/26656#issuecomment-566404525
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20234/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL 
filter clause for aggregate expression
URL: https://github.com/apache/spark/pull/26656#issuecomment-566404525
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20234/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL 
filter clause for aggregate expression
URL: https://github.com/apache/spark/pull/26656#issuecomment-566404516
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression

2019-12-16 Thread GitBox
beliefer commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter 
clause for aggregate expression
URL: https://github.com/apache/spark/pull/26656#issuecomment-566404539
 
 
   @cloud-fan @maropu @viirya @dongjoon-hyun In order to make review more 
convenient and easy, the current PR only supports situations where DISTINCT and 
FILTER do not occur at the same time. I create another ticket SPARK-30276 and 
will create another PR to support DISTINCT and FILTER occur at the same time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression

2019-12-16 Thread GitBox
SparkQA commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter 
clause for aggregate expression
URL: https://github.com/apache/spark/pull/26656#issuecomment-566404107
 
 
   **[Test build #115431 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115431/testReport)**
 for PR 26656 at commit 
[`b29ef0f`](https://github.com/apache/spark/commit/b29ef0fd901ccebafd176492a61f0b6d96e330ba).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on issue #26882: [SPARK-30247][PySpark] GaussianMixtureModel in py side should expose gaussian

2019-12-16 Thread GitBox
zhengruifeng commented on issue #26882: [SPARK-30247][PySpark] 
GaussianMixtureModel in py side should expose gaussian
URL: https://github.com/apache/spark/pull/26882#issuecomment-566404111
 
 
   Sorry for the late reply.
   It seems we need to add a corresponding class `MultivariateGaussian` 
containing a vector and a matrix in the py side, otherwise the `gaussian` can 
not be used in the py side.
   ```python
   In [8]: model.gaussians
   Out[8]: 
   [{'__class__': 'org.apache.spark.ml.stat.distribution.MultivariateGaussian'},
{'__class__': 'org.apache.spark.ml.stat.distribution.MultivariateGaussian'},
{'__class__': 'org.apache.spark.ml.stat.distribution.MultivariateGaussian'}]
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache

2019-12-16 Thread GitBox
yaooqinn commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size 
field for interval column cache
URL: https://github.com/apache/spark/pull/26906#issuecomment-566403842
 
 
   thanks for merging and reviewing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache

2019-12-16 Thread GitBox
maropu commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size 
field for interval column cache
URL: https://github.com/apache/spark/pull/26906#issuecomment-566403582
 
 
   Thanks! Merged to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu closed pull request #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache

2019-12-16 Thread GitBox
maropu closed pull request #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size 
field for interval column cache
URL: https://github.com/apache/spark/pull/26906
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove 
size field for interval column cache
URL: https://github.com/apache/spark/pull/26906#issuecomment-566402107
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115420/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #26858: [SPARK-30120][ML] Use BoundedPriorityQueue for small dataset in LSH approxNearestNeighbors

2019-12-16 Thread GitBox
zhengruifeng commented on a change in pull request #26858: [SPARK-30120][ML] 
Use BoundedPriorityQueue for small dataset in LSH approxNearestNeighbors
URL: https://github.com/apache/spark/pull/26858#discussion_r358620968
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala
 ##
 @@ -138,21 +139,31 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]]
   // Limit the use of hashDist since it's controversial
   val hashDistUDF = udf((x: Seq[Vector]) => hashDistance(x, keyHash), 
DataTypes.DoubleType)
   val hashDistCol = hashDistUDF(col($(outputCol)))
-
-  // Compute threshold to get around k elements.
-  // To guarantee to have enough neighbors in one pass, we need (p - err) 
* N >= M
-  // so we pick quantile p = M / N + err
-  // M: the number of nearest neighbors; N: the number of elements in 
dataset
-  val relativeError = 0.05
-  val approxQuantile = numNearestNeighbors.toDouble / count + relativeError
   val modelDatasetWithDist = modelDataset.withColumn(distCol, hashDistCol)
-  if (approxQuantile >= 1) {
-modelDatasetWithDist
+  // for a small dataset, use BoundedPriorityQueue
+  if (count < 1000) {
+val queue = new 
BoundedPriorityQueue[Double](count.toInt)(Ordering[Double])
 
 Review comment:
   A slight performance gain may come from that ` BoundedPriorityQueue` do not 
need a `count` job to compute the var `approxQuantile`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26906: [SPARK-30066][SQL][FOLLOWUP] 
Remove size field for interval column cache
URL: https://github.com/apache/spark/pull/26906#issuecomment-566402102
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26906: [SPARK-30066][SQL][FOLLOWUP] 
Remove size field for interval column cache
URL: https://github.com/apache/spark/pull/26906#issuecomment-566402107
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115420/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove 
size field for interval column cache
URL: https://github.com/apache/spark/pull/26906#issuecomment-566402102
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache

2019-12-16 Thread GitBox
SparkQA removed a comment on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove 
size field for interval column cache
URL: https://github.com/apache/spark/pull/26906#issuecomment-566350550
 
 
   **[Test build #115420 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115420/testReport)**
 for PR 26906 at commit 
[`11b7f71`](https://github.com/apache/spark/commit/11b7f718e53c17e9a7c2946bddcaf8b860562d31).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache

2019-12-16 Thread GitBox
SparkQA commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size 
field for interval column cache
URL: https://github.com/apache/spark/pull/26906#issuecomment-566401668
 
 
   **[Test build #115420 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115420/testReport)**
 for PR 26906 at commit 
[`11b7f71`](https://github.com/apache/spark/commit/11b7f718e53c17e9a7c2946bddcaf8b860562d31).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #26858: [SPARK-30120][ML] Use BoundedPriorityQueue for small dataset in LSH approxNearestNeighbors

2019-12-16 Thread GitBox
zhengruifeng commented on a change in pull request #26858: [SPARK-30120][ML] 
Use BoundedPriorityQueue for small dataset in LSH approxNearestNeighbors
URL: https://github.com/apache/spark/pull/26858#discussion_r358620329
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala
 ##
 @@ -138,21 +139,31 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]]
   // Limit the use of hashDist since it's controversial
   val hashDistUDF = udf((x: Seq[Vector]) => hashDistance(x, keyHash), 
DataTypes.DoubleType)
   val hashDistCol = hashDistUDF(col($(outputCol)))
-
-  // Compute threshold to get around k elements.
-  // To guarantee to have enough neighbors in one pass, we need (p - err) 
* N >= M
-  // so we pick quantile p = M / N + err
-  // M: the number of nearest neighbors; N: the number of elements in 
dataset
-  val relativeError = 0.05
-  val approxQuantile = numNearestNeighbors.toDouble / count + relativeError
   val modelDatasetWithDist = modelDataset.withColumn(distCol, hashDistCol)
-  if (approxQuantile >= 1) {
-modelDatasetWithDist
+  // for a small dataset, use BoundedPriorityQueue
+  if (count < 1000) {
+val queue = new 
BoundedPriorityQueue[Double](count.toInt)(Ordering[Double])
 
 Review comment:
   I wrongly thought that the `approxNearestNeighbors` only return an 
approximate threshold, then we can use top-k to obtain an exact threshold.
   Since the `approxNearestNeighbors` already gaurantee an enough threshold 
which had already taken the relative error into account, so **I guess we no 
longer need a top-k solution.**
   A  `BoundedPriorityQueue` only maintains the topK entries, so it should be 
much smaller than a `QuantileSummaries`, however since there is only one column 
to process, so there should be no performance gain.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #26858: [SPARK-30120][ML] Use BoundedPriorityQueue for small dataset in LSH approxNearestNeighbors

2019-12-16 Thread GitBox
zhengruifeng commented on a change in pull request #26858: [SPARK-30120][ML] 
Use BoundedPriorityQueue for small dataset in LSH approxNearestNeighbors
URL: https://github.com/apache/spark/pull/26858#discussion_r358620083
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala
 ##
 @@ -138,21 +139,31 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]]
   // Limit the use of hashDist since it's controversial
   val hashDistUDF = udf((x: Seq[Vector]) => hashDistance(x, keyHash), 
DataTypes.DoubleType)
   val hashDistCol = hashDistUDF(col($(outputCol)))
-
-  // Compute threshold to get around k elements.
-  // To guarantee to have enough neighbors in one pass, we need (p - err) 
* N >= M
-  // so we pick quantile p = M / N + err
-  // M: the number of nearest neighbors; N: the number of elements in 
dataset
-  val relativeError = 0.05
-  val approxQuantile = numNearestNeighbors.toDouble / count + relativeError
   val modelDatasetWithDist = modelDataset.withColumn(distCol, hashDistCol)
-  if (approxQuantile >= 1) {
-modelDatasetWithDist
+  // for a small dataset, use BoundedPriorityQueue
+  if (count < 1000) {
+val queue = new 
BoundedPriorityQueue[Double](count.toInt)(Ordering[Double])
 
 Review comment:
   I wrongly thought that the `approxNearestNeighbors` only return an 
approximate threshold, then we can use top-k to obtain an exact threshold.
   Since the `approxNearestNeighbors` already gaurantee an enough threshold 
which had already taken the relative error into account, so **I guess we no 
longer need a top-k solution.**
   A  `BoundedPriorityQueue` only maintains the topK entries, so it should be 
much smaller than a `QuantileSummaries`, however since there is only one column 
to process, so there should be no performance gain.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

2019-12-16 Thread GitBox
SparkQA commented on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait 
time be the time since a TSM's available slots were fully utilized
URL: https://github.com/apache/spark/pull/26696#issuecomment-566399899
 
 
   **[Test build #115430 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115430/testReport)**
 for PR 26696 at commit 
[`168ab30`](https://github.com/apache/spark/commit/168ab306c9df7568a1075ef3314a47dbec26ed95).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] bmarcott commented on a change in pull request #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

2019-12-16 Thread GitBox
bmarcott commented on a change in pull request #26696: [WIP][SPARK-18886][CORE] 
Make locality wait time be the time since a TSM's available slots were fully 
utilized
URL: https://github.com/apache/spark/pull/26696#discussion_r358617814
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
 ##
 @@ -403,6 +406,9 @@ private[spark] class TaskSchedulerImpl(
   if (!executorIdToRunningTaskIds.contains(o.executorId)) {
 hostToExecutors(o.host) += o.executorId
 executorAdded(o.executorId, o.host)
+// Assumes the first offer will include all cores (free cores == all 
cores)
 
 Review comment:
   can anyone confirm whether it is true that the first resource offer for an 
executor will include all cores?
   even if this is true it feels odd to rely on it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amanomer commented on a change in pull request #26811: [SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType

2019-12-16 Thread GitBox
amanomer commented on a change in pull request #26811: [SPARK-29600][SQL] 
ArrayContains function may return incorrect result for DecimalType
URL: https://github.com/apache/spark/pull/26811#discussion_r358618026
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
 ##
 @@ -850,7 +850,7 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSparkSession {
 val errorMsg1 =
   s"""
  |Input to function array_contains should have been array followed by a
- |value with same element type, but it's [array, decimal(29,29)].
+ |value with same element type, but it's [array, decimal(38,29)].
 
 Review comment:
   cc @maropu @cloud-fan 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amanomer commented on a change in pull request #26811: [SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType

2019-12-16 Thread GitBox
amanomer commented on a change in pull request #26811: [SPARK-29600][SQL] 
ArrayContains function may return incorrect result for DecimalType
URL: https://github.com/apache/spark/pull/26811#discussion_r358076319
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
 ##
 @@ -850,7 +850,7 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSparkSession {
 val errorMsg1 =
   s"""
  |Input to function array_contains should have been array followed by a
- |value with same element type, but it's [array, decimal(29,29)].
+ |value with same element type, but it's [array, decimal(38,29)].
 
 Review comment:
   cc @cloud-fan @maropu 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26696: [WIP][SPARK-18886][CORE] Make 
locality wait time be the time since a TSM's available slots were fully utilized
URL: https://github.com/apache/spark/pull/26696#issuecomment-566398230
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20233/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amanomer commented on a change in pull request #26811: [SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType

2019-12-16 Thread GitBox
amanomer commented on a change in pull request #26811: [SPARK-29600][SQL] 
ArrayContains function may return incorrect result for DecimalType
URL: https://github.com/apache/spark/pull/26811#discussion_r358617660
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
 ##
 @@ -850,7 +850,7 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSparkSession {
 val errorMsg1 =
   s"""
  |Input to function array_contains should have been array followed by a
- |value with same element type, but it's [array, decimal(29,29)].
+ |value with same element type, but it's [array, decimal(38,29)].
 
 Review comment:
   Do you mean why in above test case query, `ArrayContains` is throwing 
`AnalysisException` instead of casting integer to Decimal?
   
   An integer cannot be casted to decimal with scale > 28.
   
   ```
   decimalWith28Zeroes = 1.
   SELECT array_contains(array(1), decimalWith28Zeroes);
   Result =>> true
   ```
   
   ```
   decimalWith29Zeroes = 1.0
   SELECT array_contains(array(1), decimalWith29Zeroes);
   Result =>> AnalysisException
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26696: [WIP][SPARK-18886][CORE] Make 
locality wait time be the time since a TSM's available slots were fully utilized
URL: https://github.com/apache/spark/pull/26696#issuecomment-566398221
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26696: [WIP][SPARK-18886][CORE] Make locality 
wait time be the time since a TSM's available slots were fully utilized
URL: https://github.com/apache/spark/pull/26696#issuecomment-566398230
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20233/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26696: [WIP][SPARK-18886][CORE] Make locality 
wait time be the time since a TSM's available slots were fully utilized
URL: https://github.com/apache/spark/pull/26696#issuecomment-566398221
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] bmarcott commented on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

2019-12-16 Thread GitBox
bmarcott commented on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait 
time be the time since a TSM's available slots were fully utilized
URL: https://github.com/apache/spark/pull/26696#issuecomment-566398156
 
 
   back from vacation and updated the PR trying to address all the comments so 
far.
   let me know if I left one of your comments out.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams

2019-12-16 Thread GitBox
SparkQA commented on issue #26838: [SPARK-30144][ML][PySpark] Make 
MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
URL: https://github.com/apache/spark/pull/26838#issuecomment-566395699
 
 
   **[Test build #115429 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115429/testReport)**
 for PR 26838 at commit 
[`7a98ffb`](https://github.com/apache/spark/commit/7a98ffbc780770bfb6454ea72a597b1c0fb168d1).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26838: [SPARK-30144][ML][PySpark] Make 
MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
URL: https://github.com/apache/spark/pull/26838#issuecomment-566394210
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20232/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26838: [SPARK-30144][ML][PySpark] Make 
MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
URL: https://github.com/apache/spark/pull/26838#issuecomment-566394205
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26838: [SPARK-30144][ML][PySpark] 
Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
URL: https://github.com/apache/spark/pull/26838#issuecomment-566394210
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20232/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26838: [SPARK-30144][ML][PySpark] 
Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
URL: https://github.com/apache/spark/pull/26838#issuecomment-566394205
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on issue #26917: [SPARK-30278][SQL][DOC] Update Spark SQL document menu for new changes

2019-12-16 Thread GitBox
maropu commented on issue #26917: [SPARK-30278][SQL][DOC] Update Spark SQL 
document menu for new changes
URL: https://github.com/apache/spark/pull/26917#issuecomment-566394194
 
 
   How about adding the docs about the SQL case, too? 
https://github.com/apache/spark/pull/25464


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams

2019-12-16 Thread GitBox
huaxingao commented on issue #26838: [SPARK-30144][ML][PySpark] Make 
MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
URL: https://github.com/apache/spark/pull/26838#issuecomment-566393965
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] planga82 commented on a change in pull request #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution

2019-12-16 Thread GitBox
planga82 commented on a change in pull request #26890: [SPARK-30039][SQL] 
CREATE FUNCTION should do multi-catalog resolution
URL: https://github.com/apache/spark/pull/26890#discussion_r358613325
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
 ##
 @@ -474,48 +474,48 @@ class ResolveSessionCatalog(
 tbl.asTableIdentifier,
 propertyKey)
 
-case DescribeFunctionStatement(CatalogAndIdentifier(catalog, 
functionIdent), extended) =>
-  val functionIdentifier = if (isSessionCatalog(catalog)) {
-functionIdent.asMultipartIdentifier match {
-  case Seq(db, fn) => FunctionIdentifier(fn, Some(db))
-  case Seq(fn) => FunctionIdentifier(fn, None)
-  case _ =>
-throw new AnalysisException(s"Unsupported function name 
'${functionIdent.quoted}'")
-}
-  } else {
-throw new AnalysisException ("DESCRIBE FUNCTION is only supported in 
v1 catalog")
-  }
-  DescribeFunctionCommand(functionIdentifier, extended)
+case DescribeFunctionStatement(CatalogAndIdentifier(catalog, ident), 
extended) =>
+  val functionIdent =
+parseSessionCatalogFunctionIdentifier("DESCRIBE FUNCTION", catalog, 
ident)
+  DescribeFunctionCommand(functionIdent, extended)
 
 case ShowFunctionsStatement(userScope, systemScope, pattern, fun) =>
   val (database, function) = fun match {
-case Some(CatalogAndIdentifier(catalog, functionIdent)) =>
-  if (isSessionCatalog(catalog)) {
-functionIdent.asMultipartIdentifier match {
-  case Seq(db, fn) => (Some(db), Some(fn))
-  case Seq(fn) => (None, Some(fn))
-  case _ =>
-throw new AnalysisException(s"Unsupported function name 
'${functionIdent.quoted}'")
-}
-  } else {
-throw new AnalysisException ("SHOW FUNCTIONS is only supported in 
v1 catalog")
-  }
+case Some(CatalogAndIdentifier(catalog, ident)) =>
+  val FunctionIdentifier(fn, db) =
+parseSessionCatalogFunctionIdentifier("SHOW FUNCTIONS", catalog, 
ident)
+  (db, Some(fn))
 case None => (None, pattern)
   }
   ShowFunctionsCommand(database, function, userScope, systemScope)
 
-case DropFunctionStatement(CatalogAndIdentifier(catalog, functionIdent), 
ifExists, isTemp) =>
-  if (isSessionCatalog(catalog)) {
-val (database, function) = functionIdent.asMultipartIdentifier match {
-  case Seq(db, fn) => (Some(db), fn)
-  case Seq(fn) => (None, fn)
-  case _ =>
-throw new AnalysisException(s"Unsupported function name 
'${functionIdent.quoted}'")
-}
-DropFunctionCommand(database, function, ifExists, isTemp)
-  } else {
-throw new AnalysisException("DROP FUNCTION is only supported in v1 
catalog")
+case DropFunctionStatement(CatalogAndIdentifier(catalog, ident), ifExists, 
isTemp) =>
+  val FunctionIdentifier(function, database) =
+parseSessionCatalogFunctionIdentifier("DROP FUNCTION", catalog, ident)
+  DropFunctionCommand(database, function, ifExists, isTemp)
+
+case CreateFunctionStatement(CatalogAndIdentifier(catalog, ident),
+  className, resources, isTemp, ignoreIfExists, replace) =>
+  val FunctionIdentifier(function, database) =
+parseSessionCatalogFunctionIdentifier("CREATE FUNCTION", catalog, 
ident)
+  CreateFunctionCommand(database, function, className, resources, isTemp, 
ignoreIfExists,
+replace)
+  }
+
+  private def parseSessionCatalogFunctionIdentifier(
+  sql: String,
+  catalog: CatalogPlugin,
+  functionIdent: Identifier): FunctionIdentifier = {
+if (isSessionCatalog(catalog)) {
+  functionIdent.asMultipartIdentifier match {
+case Seq(db, fn) => FunctionIdentifier(fn, Some(db))
+case Seq(fn) => FunctionIdentifier(fn, None)
+case _ =>
+  throw new AnalysisException(s"Unsupported function name 
'${functionIdent.quoted}'")
 
 Review comment:
   In the show columns statement we decide this message
   > Namespace name should have only one part if specified
   
   How about the same message or similar? Both are clearer


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] planga82 commented on a change in pull request #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution

2019-12-16 Thread GitBox
planga82 commented on a change in pull request #26890: [SPARK-30039][SQL] 
CREATE FUNCTION should do multi-catalog resolution
URL: https://github.com/apache/spark/pull/26890#discussion_r358613325
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
 ##
 @@ -474,48 +474,48 @@ class ResolveSessionCatalog(
 tbl.asTableIdentifier,
 propertyKey)
 
-case DescribeFunctionStatement(CatalogAndIdentifier(catalog, 
functionIdent), extended) =>
-  val functionIdentifier = if (isSessionCatalog(catalog)) {
-functionIdent.asMultipartIdentifier match {
-  case Seq(db, fn) => FunctionIdentifier(fn, Some(db))
-  case Seq(fn) => FunctionIdentifier(fn, None)
-  case _ =>
-throw new AnalysisException(s"Unsupported function name 
'${functionIdent.quoted}'")
-}
-  } else {
-throw new AnalysisException ("DESCRIBE FUNCTION is only supported in 
v1 catalog")
-  }
-  DescribeFunctionCommand(functionIdentifier, extended)
+case DescribeFunctionStatement(CatalogAndIdentifier(catalog, ident), 
extended) =>
+  val functionIdent =
+parseSessionCatalogFunctionIdentifier("DESCRIBE FUNCTION", catalog, 
ident)
+  DescribeFunctionCommand(functionIdent, extended)
 
 case ShowFunctionsStatement(userScope, systemScope, pattern, fun) =>
   val (database, function) = fun match {
-case Some(CatalogAndIdentifier(catalog, functionIdent)) =>
-  if (isSessionCatalog(catalog)) {
-functionIdent.asMultipartIdentifier match {
-  case Seq(db, fn) => (Some(db), Some(fn))
-  case Seq(fn) => (None, Some(fn))
-  case _ =>
-throw new AnalysisException(s"Unsupported function name 
'${functionIdent.quoted}'")
-}
-  } else {
-throw new AnalysisException ("SHOW FUNCTIONS is only supported in 
v1 catalog")
-  }
+case Some(CatalogAndIdentifier(catalog, ident)) =>
+  val FunctionIdentifier(fn, db) =
+parseSessionCatalogFunctionIdentifier("SHOW FUNCTIONS", catalog, 
ident)
+  (db, Some(fn))
 case None => (None, pattern)
   }
   ShowFunctionsCommand(database, function, userScope, systemScope)
 
-case DropFunctionStatement(CatalogAndIdentifier(catalog, functionIdent), 
ifExists, isTemp) =>
-  if (isSessionCatalog(catalog)) {
-val (database, function) = functionIdent.asMultipartIdentifier match {
-  case Seq(db, fn) => (Some(db), fn)
-  case Seq(fn) => (None, fn)
-  case _ =>
-throw new AnalysisException(s"Unsupported function name 
'${functionIdent.quoted}'")
-}
-DropFunctionCommand(database, function, ifExists, isTemp)
-  } else {
-throw new AnalysisException("DROP FUNCTION is only supported in v1 
catalog")
+case DropFunctionStatement(CatalogAndIdentifier(catalog, ident), ifExists, 
isTemp) =>
+  val FunctionIdentifier(function, database) =
+parseSessionCatalogFunctionIdentifier("DROP FUNCTION", catalog, ident)
+  DropFunctionCommand(database, function, ifExists, isTemp)
+
+case CreateFunctionStatement(CatalogAndIdentifier(catalog, ident),
+  className, resources, isTemp, ignoreIfExists, replace) =>
+  val FunctionIdentifier(function, database) =
+parseSessionCatalogFunctionIdentifier("CREATE FUNCTION", catalog, 
ident)
+  CreateFunctionCommand(database, function, className, resources, isTemp, 
ignoreIfExists,
+replace)
+  }
+
+  private def parseSessionCatalogFunctionIdentifier(
+  sql: String,
+  catalog: CatalogPlugin,
+  functionIdent: Identifier): FunctionIdentifier = {
+if (isSessionCatalog(catalog)) {
+  functionIdent.asMultipartIdentifier match {
+case Seq(db, fn) => FunctionIdentifier(fn, Some(db))
+case Seq(fn) => FunctionIdentifier(fn, None)
+case _ =>
+  throw new AnalysisException(s"Unsupported function name 
'${functionIdent.quoted}'")
 
 Review comment:
   In the show columns statement we decide this message
   > Namespace name should have only one part if specified
   How about the same message or similar? Both are clearer


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on issue #26807: [SPARK-30181][SQL] Only add string or integral type column to metastore partition filter

2019-12-16 Thread GitBox
cloud-fan commented on issue #26807: [SPARK-30181][SQL] Only add string or 
integral type column to metastore partition filter
URL: https://github.com/apache/spark/pull/26807#issuecomment-566391709
 
 
   Did you use the latest branch-2.4? I'm a little confused about why the bug 
is still there. No matter we do cast or not, we check the type of the column, 
not the type of the cast expression.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile

2019-12-16 Thread GitBox
yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid 
overflow and match error in ApproximatePercentile
URL: https://github.com/apache/spark/pull/26905#discussion_r358607182
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 ##
 @@ -83,32 +83,37 @@ case class ApproximatePercentile(
   }
 
   // Mark as lazy so that accuracyExpression is not evaluated during tree 
transformation.
-  private lazy val accuracy: Int = accuracyExpression.eval().asInstanceOf[Int]
-
-  override def inputTypes: Seq[AbstractDataType] = {
-// Support NumericType, DateType and TimestampType since their internal 
types are all numeric,
-// and can be easily cast to double for processing.
-Seq(TypeCollection(NumericType, DateType, TimestampType),
-  TypeCollection(DoubleType, ArrayType(DoubleType)), IntegerType)
-  }
+  private lazy val accuracy: Long = 
accuracyExpression.eval().asInstanceOf[Number].longValue()
 
   // Mark as lazy so that percentageExpression is not evaluated during tree 
transformation.
   private lazy val (returnPercentileArray: Boolean, percentages: 
Array[Double]) =
-percentageExpression.eval() match {
-  // Rule ImplicitTypeCasts can cast other numeric types to double
-  case num: Double => (false, Array(num))
-  case arrayData: ArrayData => (true, arrayData.toDoubleArray())
+percentageExpression.dataType match {
+  case DoubleType => (false, 
Array(percentageExpression.eval().asInstanceOf[Double]))
+  case _: NumericType =>
 
 Review comment:
   Ok, I' ll check on this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26919: [SPARK-30280][DOC] Update docs for make Hive 2.3 dependency by default

2019-12-16 Thread GitBox
dongjoon-hyun commented on a change in pull request #26919: [SPARK-30280][DOC] 
Update docs for make Hive 2.3 dependency by default
URL: https://github.com/apache/spark/pull/26919#discussion_r358606356
 
 

 ##
 File path: docs/building-spark.md
 ##
 @@ -83,13 +83,10 @@ Example:
 
 To enable Hive integration for Spark SQL along with its JDBC server and CLI,
 add the `-Phive` and `-Phive-thriftserver` profiles to your existing build 
options.
-By default, Spark will use Hive 1.2.1 with the `hadoop-2.7` profile, and Hive 
2.3.6 with the `hadoop-3.2` profile.
-
-# With Hive 1.2.1 support
-./build/mvn -Pyarn -Phive -Phive-thriftserver -DskipTests clean package
+By default Spark will build with Hive 2.3.6.
 
 # With Hive 2.3.6 support
-./build/mvn -Pyarn -Phive -Phive-thriftserver -Phadoop-3.2 -DskipTests 
clean package
+./build/mvn -Pyarn -Phive -Phive-thriftserver -DskipTests clean package
 
 Review comment:
   Shall we add the Hive 1.2.1 example back after this for users?
   ```
   # With Hive 1.2.1 support
   ./build/mvn -Pyarn -Phive -Phive-thriftserver -Phive-1.2 -DskipTests clean 
package
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26919: [SPARK-30280][DOC] Update docs for make Hive 2.3 dependency by default

2019-12-16 Thread GitBox
dongjoon-hyun commented on a change in pull request #26919: [SPARK-30280][DOC] 
Update docs for make Hive 2.3 dependency by default
URL: https://github.com/apache/spark/pull/26919#discussion_r358606489
 
 

 ##
 File path: docs/building-spark.md
 ##
 @@ -83,13 +83,10 @@ Example:
 
 To enable Hive integration for Spark SQL along with its JDBC server and CLI,
 add the `-Phive` and `-Phive-thriftserver` profiles to your existing build 
options.
-By default, Spark will use Hive 1.2.1 with the `hadoop-2.7` profile, and Hive 
2.3.6 with the `hadoop-3.2` profile.
-
-# With Hive 1.2.1 support
-./build/mvn -Pyarn -Phive -Phive-thriftserver -DskipTests clean package
+By default Spark will build with Hive 2.3.6.
 
 # With Hive 2.3.6 support
-./build/mvn -Pyarn -Phive -Phive-thriftserver -Phadoop-3.2 -DskipTests 
clean package
+./build/mvn -Pyarn -Phive -Phive-thriftserver -DskipTests clean package
 
 Review comment:
   cc @gatorsmile and @srowen 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile

2019-12-16 Thread GitBox
cloud-fan commented on a change in pull request #26905: [SPARK-30266][SQL] 
Avoid overflow and match error in ApproximatePercentile
URL: https://github.com/apache/spark/pull/26905#discussion_r358606175
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 ##
 @@ -83,32 +83,37 @@ case class ApproximatePercentile(
   }
 
   // Mark as lazy so that accuracyExpression is not evaluated during tree 
transformation.
-  private lazy val accuracy: Int = accuracyExpression.eval().asInstanceOf[Int]
-
-  override def inputTypes: Seq[AbstractDataType] = {
-// Support NumericType, DateType and TimestampType since their internal 
types are all numeric,
-// and can be easily cast to double for processing.
-Seq(TypeCollection(NumericType, DateType, TimestampType),
-  TypeCollection(DoubleType, ArrayType(DoubleType)), IntegerType)
-  }
+  private lazy val accuracy: Long = 
accuracyExpression.eval().asInstanceOf[Number].longValue()
 
   // Mark as lazy so that percentageExpression is not evaluated during tree 
transformation.
   private lazy val (returnPercentileArray: Boolean, percentages: 
Array[Double]) =
-percentageExpression.eval() match {
-  // Rule ImplicitTypeCasts can cast other numeric types to double
-  case num: Double => (false, Array(num))
-  case arrayData: ArrayData => (true, arrayData.toDoubleArray())
+percentageExpression.dataType match {
+  case DoubleType => (false, 
Array(percentageExpression.eval().asInstanceOf[Double]))
+  case _: NumericType =>
 
 Review comment:
   We can do custom type coercion (don't extend ImplicitCastInputTypes) and 
disallow casting long to int implicitly for this function.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26919: [SPARK-30280][DOC] Update docs for make Hive 2.3 dependency by default

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26919: [SPARK-30280][DOC] Update docs 
for make Hive 2.3 dependency by default
URL: https://github.com/apache/spark/pull/26919#issuecomment-566388749
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115428/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26919: [SPARK-30280][DOC] Update docs for make Hive 2.3 dependency by default

2019-12-16 Thread GitBox
AmplabJenkins removed a comment on issue #26919: [SPARK-30280][DOC] Update docs 
for make Hive 2.3 dependency by default
URL: https://github.com/apache/spark/pull/26919#issuecomment-566388743
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26919: [SPARK-30280][DOC] Update docs for make Hive 2.3 dependency by default

2019-12-16 Thread GitBox
AmplabJenkins commented on issue #26919: [SPARK-30280][DOC] Update docs for 
make Hive 2.3 dependency by default
URL: https://github.com/apache/spark/pull/26919#issuecomment-566388749
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115428/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   >