date:20211122

[GitHub] [spark] SparkQA commented on pull request #34070: [SPARK-36840][SQL] Support DPP if there is no selective predicate on the filtering side

2021-11-22 Thread GitBox



SparkQA commented on pull request #34070:
URL: https://github.com/apache/spark/pull/34070#issuecomment-976236122


   **[Test build #145541 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145541/testReport)**
 for PR 34070 at commit 
[`30deb9d`](https://github.com/apache/spark/commit/30deb9d84e56869af0d55b0da6c933462e3e0785).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34611: [SPARK-35867][SQL] Enable vectorized read for VectorizedPlainValuesReader.readBooleans

2021-11-22 Thread GitBox



SparkQA commented on pull request #34611:
URL: https://github.com/apache/spark/pull/34611#issuecomment-976235724


   **[Test build #145540 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145540/testReport)**
 for PR 34611 at commit 
[`af97fb3`](https://github.com/apache/spark/commit/af97fb3a629d07628105868d73ae3ba9d8e6dc90).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-22 Thread GitBox



AmplabJenkins removed a comment on pull request #34677:
URL: https://github.com/apache/spark/pull/34677#issuecomment-976233797


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50007/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34687: [SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series

2021-11-22 Thread GitBox



AmplabJenkins removed a comment on pull request #34687:
URL: https://github.com/apache/spark/pull/34687#issuecomment-976233532


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145537/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34686: [SPARK-37444][SQL] ALTER NAMESPACE ... SET LOCATION should handle empty location consistently across v1 and v2 command

2021-11-22 Thread GitBox



AmplabJenkins removed a comment on pull request #34686:
URL: https://github.com/apache/spark/pull/34686#issuecomment-976233533


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145525/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34688: [WIP][SPARK-32079][PYTHON] Remove namedtuple hack by replace built-in pickle to cloudpickle

2021-11-22 Thread GitBox



AmplabJenkins removed a comment on pull request #34688:
URL: https://github.com/apache/spark/pull/34688#issuecomment-976233534






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-22 Thread GitBox



SparkQA commented on pull request #34677:
URL: https://github.com/apache/spark/pull/34677#issuecomment-976233773


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50007/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-22 Thread GitBox



AmplabJenkins commented on pull request #34677:
URL: https://github.com/apache/spark/pull/34677#issuecomment-976233797


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50007/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34687: [SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series

2021-11-22 Thread GitBox



AmplabJenkins commented on pull request #34687:
URL: https://github.com/apache/spark/pull/34687#issuecomment-976233532


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145537/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34686: [SPARK-37444][SQL] ALTER NAMESPACE ... SET LOCATION should handle empty location consistently across v1 and v2 command

2021-11-22 Thread GitBox



AmplabJenkins commented on pull request #34686:
URL: https://github.com/apache/spark/pull/34686#issuecomment-976233533


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145525/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34688: [WIP][SPARK-32079][PYTHON] Remove namedtuple hack by replace built-in pickle to cloudpickle

2021-11-22 Thread GitBox



AmplabJenkins commented on pull request #34688:
URL: https://github.com/apache/spark/pull/34688#issuecomment-976233534






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34686: [SPARK-37444][SQL] ALTER NAMESPACE ... SET LOCATION should handle empty location consistently across v1 and v2 command

2021-11-22 Thread GitBox



SparkQA removed a comment on pull request #34686:
URL: https://github.com/apache/spark/pull/34686#issuecomment-976116428


   **[Test build #145525 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145525/testReport)**
 for PR 34686 at commit 
[`28ce116`](https://github.com/apache/spark/commit/28ce116e2cdd35333aae6f58ed579d18d1989597).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34686: [SPARK-37444][SQL] ALTER NAMESPACE ... SET LOCATION should handle empty location consistently across v1 and v2 command

2021-11-22 Thread GitBox



SparkQA commented on pull request #34686:
URL: https://github.com/apache/spark/pull/34686#issuecomment-976230590


   **[Test build #145525 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145525/testReport)**
 for PR 34686 at commit 
[`28ce116`](https://github.com/apache/spark/commit/28ce116e2cdd35333aae6f58ed579d18d1989597).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class AlterDatabaseSetLocationCommand(databaseName: String, 
location: URI)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34688: [WIP][SPARK-32079][PYTHON] Remove namedtuple hack by replace built-in pickle to cloudpickle

2021-11-22 Thread GitBox



SparkQA commented on pull request #34688:
URL: https://github.com/apache/spark/pull/34688#issuecomment-976228942


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50009/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34687: [SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series

2021-11-22 Thread GitBox



SparkQA commented on pull request #34687:
URL: https://github.com/apache/spark/pull/34687#issuecomment-976228510


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50010/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yangwwei commented on pull request #34672: [SPARK-37394][CORE] Skip registering with ESS if a customized shuffle manager is configured

2021-11-22 Thread GitBox

yangwwei commented on pull request #34672:
URL: https://github.com/apache/spark/pull/34672#issuecomment-976225163

Thank you @HyukjinKwon , @attilapiros , @tgravescs

>What about extending ShuffleManager trait with a new method indicating
whether this shuffle manager implementation works with the external shuffle
manager or not. It can have a default implementation giving back true and only
needed to be overridden when the external shuffle manager is not supported.

I really like this idea, thank you @attilapiros. How about adding a new
method: `supportExternalShuffleService()`. This method gives each shuffle
manager implementation a way to tell if the external shuffle service is needed
for this shuffle manager to work. Default it returns true, and then the block
manager will register with the external shuffle server; otherwise, that
registration can be skipped.

>So my first reaction is: you have a 3rd party shuffle manager that is an
external shuffle service because it supports dynamic allocation, then why is it
failing... is it because you didn't override something, or because you couldn't
override something? In this case it's creating a ExternalBlockStoreClient,
which I think isn't setup to be overridden. I think it comes down to we just
haven't really added support to allow this.

We actually found this issue while using [Uber's Remote Shuffle
Service](https://github.com/uber/RemoteShuffleService) with DA enabled. This is
due to [this part of
code](https://github.com/apache/spark/blob/5d3a6573a56f9c00ccc513c8131c037de7d29000/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L532-L534)
being hardcoded to register with the external shuffle service even a 3rd party
shuffle service is used. We will need a more general way to handle this. Please
let me know your thoughts, thanks!

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34688: [WIP][SPARK-32079][PYTHON] Remove namedtuple hack by replace built-in pickle to cloudpickle

2021-11-22 Thread GitBox



SparkQA commented on pull request #34688:
URL: https://github.com/apache/spark/pull/34688#issuecomment-976224175


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50006/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34687: [SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series

2021-11-22 Thread GitBox



SparkQA removed a comment on pull request #34687:
URL: https://github.com/apache/spark/pull/34687#issuecomment-976207876


   **[Test build #145537 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145537/testReport)**
 for PR 34687 at commit 
[`a643b5e`](https://github.com/apache/spark/commit/a643b5eff512a397723732c07e51a56aa5044f30).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34687: [SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series

2021-11-22 Thread GitBox



SparkQA commented on pull request #34687:
URL: https://github.com/apache/spark/pull/34687#issuecomment-976221908


   **[Test build #145537 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145537/testReport)**
 for PR 34687 at commit 
[`a643b5e`](https://github.com/apache/spark/commit/a643b5eff512a397723732c07e51a56aa5044f30).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34688: [WIP][SPARK-32079][PYTHON] Remove namedtuple hack by replace built-in pickle to cloudpickle

2021-11-22 Thread GitBox



SparkQA removed a comment on pull request #34688:
URL: https://github.com/apache/spark/pull/34688#issuecomment-976207820


   **[Test build #145536 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145536/testReport)**
 for PR 34688 at commit 
[`8962685`](https://github.com/apache/spark/commit/8962685238818b506aed70baa8d9336f7c8cc472).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] kazuyukitanimura commented on a change in pull request #34611: [SPARK-35867][SQL] Enable vectorized read for VectorizedPlainValuesReader.readBooleans

2021-11-22 Thread GitBox



kazuyukitanimura commented on a change in pull request #34611:
URL: https://github.com/apache/spark/pull/34611#discussion_r754852154



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
##
@@ -53,19 +53,47 @@ public void skip() {
 throw new UnsupportedOperationException();
   }
 
+  private void updateCurrentByte() {
+try {
+  currentByte = (byte) in.read();
+} catch (IOException e) {
+  throw new ParquetDecodingException("Failed to read a byte", e);
+}
+  }
+
   @Override
   public final void readBooleans(int total, WritableColumnVector c, int rowId) 
{
-// TODO: properly vectorize this
-for (int i = 0; i < total; i++) {
-  c.putBoolean(rowId + i, readBoolean());
+int i = 0;
+if (bitOffset > 0) {
+  i = Math.min(8 - bitOffset, total);

Review comment:
   The code work for `total = 9` and `bitOffset = 0`. 
   
   ```
   public final void readBooleans(...
   if (bitOffset > 0) { // it will not enter here
   ...
   for (; i + 7 < total; i += 8) {
 updateCurrentByte(); // the whole one byte (8bits) is read here
   ...
   if (i < total) {
 updateCurrentByte(); // the last 1 bit is read here
   ```
   
   There is already a similar test for `total=8`, `bitOffset=1`. But I added 
`total = 9` and `bitOffset = 0` too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34688: [WIP][SPARK-32079][PYTHON] Remove namedtuple hack by replace built-in pickle to cloudpickle

2021-11-22 Thread GitBox



SparkQA commented on pull request #34688:
URL: https://github.com/apache/spark/pull/34688#issuecomment-976220161


   **[Test build #145536 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145536/testReport)**
 for PR 34688 at commit 
[`8962685`](https://github.com/apache/spark/commit/8962685238818b506aed70baa8d9336f7c8cc472).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class CloudPickleSerializer(FramedSerializer):`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34688: [WIP][SPARK-32079][PYTHON] Remove namedtuple hack by replace built-in pickle to cloudpickle

2021-11-22 Thread GitBox



SparkQA commented on pull request #34688:
URL: https://github.com/apache/spark/pull/34688#issuecomment-976219716


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50005/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34681: [SPARK-37438][SQL] ANSI mode: Use store assignment rules for resolving function invocation

2021-11-22 Thread GitBox



AmplabJenkins removed a comment on pull request #34681:
URL: https://github.com/apache/spark/pull/34681#issuecomment-976212876


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145528/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34681: [SPARK-37438][SQL] ANSI mode: Use store assignment rules for resolving function invocation

2021-11-22 Thread GitBox



AmplabJenkins commented on pull request #34681:
URL: https://github.com/apache/spark/pull/34681#issuecomment-976212876


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145528/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34647: [SPARK-36180][SQL] Support TimestampNTZ type in Hive

2021-11-22 Thread GitBox



AmplabJenkins removed a comment on pull request #34647:
URL: https://github.com/apache/spark/pull/34647#issuecomment-976211655


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145524/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34681: [SPARK-37438][SQL] ANSI mode: Use store assignment rules for resolving function invocation

2021-11-22 Thread GitBox



SparkQA removed a comment on pull request #34681:
URL: https://github.com/apache/spark/pull/34681#issuecomment-976120474


   **[Test build #145528 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145528/testReport)**
 for PR 34681 at commit 
[`b7d383e`](https://github.com/apache/spark/commit/b7d383e81ef067477aa11c9d4df40ccb0e8c04e4).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34681: [SPARK-37438][SQL] ANSI mode: Use store assignment rules for resolving function invocation

2021-11-22 Thread GitBox



SparkQA commented on pull request #34681:
URL: https://github.com/apache/spark/pull/34681#issuecomment-976212426


   **[Test build #145528 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145528/testReport)**
 for PR 34681 at commit 
[`b7d383e`](https://github.com/apache/spark/commit/b7d383e81ef067477aa11c9d4df40ccb0e8c04e4).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] kazuyukitanimura commented on a change in pull request #34611: [SPARK-35867][SQL] Enable vectorized read for VectorizedPlainValuesReader.readBooleans

2021-11-22 Thread GitBox



kazuyukitanimura commented on a change in pull request #34611:
URL: https://github.com/apache/spark/pull/34611#discussion_r754843844



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
##
@@ -53,19 +53,50 @@ public void skip() {
 throw new UnsupportedOperationException();
   }
 
+  private void updateCurrentByte() {
+try {
+  currentByte = (byte) in.read();
+} catch (IOException e) {
+  throw new ParquetDecodingException("Failed to read a byte", e);
+}
+  }
+
   @Override
   public final void readBooleans(int total, WritableColumnVector c, int rowId) 
{
-// TODO: properly vectorize this
-for (int i = 0; i < total; i++) {
-  c.putBoolean(rowId + i, readBoolean());
+int i = 0;
+if (bitOffset > 0) {
+  i = Math.min(8 - bitOffset, total);
+  c.putBooleans(rowId, i, currentByte, bitOffset);
+  bitOffset = (bitOffset + i) & 7;
+}
+for (; i + 7 < total; i += 8) {
+  updateCurrentByte();
+  c.putBooleans(rowId + i, currentByte);
+}
+if (i < total) {
+  updateCurrentByte();
+  bitOffset = total - i;
+  c.putBooleans(rowId + i, bitOffset, currentByte, 0);
 }
   }
 
   @Override
   public final void skipBooleans(int total) {
-// TODO: properly vectorize this
-for (int i = 0; i < total; i++) {
-  readBoolean();
+// Using >>3 instead of /8 below. The difference is important when 
(total-(8-bitOffset))<0.
+// E.g. (-1)>>3=(-1) vs. (-1)/8=0. The latter incorrectly enters the 
if(numBytesToSkip>=0){.

Review comment:
   Let's say `total=8`, `bitOffset=1`, then there are `(8-bitOffset)=7` 
bits to skip in the `currentByte`. Now there is still 1 more bit to skip as 
`total=8`. So `updateCurrentByte()` needs to be called to update the 
`currentByte` and  `bitOffset` will be again `1`. In the future, the rest of 
the 7bits may be read from the updated `currentByte`. For that reason, we need 
to go into the if statement when `numBytesToSkip = (8 - (8 - 1)) >> 3 = 1>>3 = 
0`.
   
   The following is a few-lines longer but equivalent condition.
   ```
   if (numBytesToSkip > 0) {
 try {
   in.skipFully(numBytesToSkip);
 } catch (IOException e) {...}
   }
   if (numBytesToSkip >= 0 && bitOffset > 0) {
   updateCurrentByte();
   }
   ```
   
   The scenario is tested at
   
https://github.com/apache/spark/pull/34611/files#diff-b84cbbb2eadfa9d267b9ab8be2e6be579f28c1813623785c9e667a864f7960e1R194
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34647: [SPARK-36180][SQL] Support TimestampNTZ type in Hive

2021-11-22 Thread GitBox



AmplabJenkins commented on pull request #34647:
URL: https://github.com/apache/spark/pull/34647#issuecomment-976211655


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145524/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34647: [SPARK-36180][SQL] Support TimestampNTZ type in Hive

2021-11-22 Thread GitBox



SparkQA removed a comment on pull request #34647:
URL: https://github.com/apache/spark/pull/34647#issuecomment-976098359


   **[Test build #145524 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145524/testReport)**
 for PR 34647 at commit 
[`99b603f`](https://github.com/apache/spark/commit/99b603f7a5b4a52b44e9ebde94c8b3e526e866c2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34647: [SPARK-36180][SQL] Support TimestampNTZ type in Hive

2021-11-22 Thread GitBox



SparkQA commented on pull request #34647:
URL: https://github.com/apache/spark/pull/34647#issuecomment-976210466


   **[Test build #145524 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145524/testReport)**
 for PR 34647 at commit 
[`99b603f`](https://github.com/apache/spark/commit/99b603f7a5b4a52b44e9ebde94c8b3e526e866c2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34611: [SPARK-35867][SQL] Enable vectorized read for VectorizedPlainValuesReader.readBooleans

2021-11-22 Thread GitBox



SparkQA commented on pull request #34611:
URL: https://github.com/apache/spark/pull/34611#issuecomment-976209809


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50008/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34689: [SPARK-37445][BUILD] Upgrade hadoop profile to hadoop-3.3 since we support hadoop-3.3 as default now

2021-11-22 Thread GitBox



SparkQA commented on pull request #34689:
URL: https://github.com/apache/spark/pull/34689#issuecomment-976209156


   **[Test build #145539 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145539/testReport)**
 for PR 34689 at commit 
[`6a446eb`](https://github.com/apache/spark/commit/6a446eb47a7810b685b2e6a5adb9f074a3f1b844).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-22 Thread GitBox



SparkQA commented on pull request #34677:
URL: https://github.com/apache/spark/pull/34677#issuecomment-976208891


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50007/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu opened a new pull request #34689: [SPARK-37445][BUILD] Upgrade hadoop profile to hadoop-3.3 since we support hadoop-3.3 as default now

2021-11-22 Thread GitBox



AngersZh opened a new pull request #34689:
URL: https://github.com/apache/spark/pull/34689


   ### What changes were proposed in this pull request?
   Upgrade hadoop profile to hadoop-3.3 since we support hadoop-3.3 as default 
now.
   
   In current project, deps's path is still hadoop-3.2, it's not correct.
   
   ### Why are the changes needed?
   Upgrade hadoop profile
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Not need


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] kazuyukitanimura commented on a change in pull request #34611: [SPARK-35867][SQL] Enable vectorized read for VectorizedPlainValuesReader.readBooleans

2021-11-22 Thread GitBox



kazuyukitanimura commented on a change in pull request #34611:
URL: https://github.com/apache/spark/pull/34611#discussion_r754840444



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
##
@@ -53,19 +53,47 @@ public void skip() {
 throw new UnsupportedOperationException();
   }
 
+  private void updateCurrentByte() {
+try {
+  currentByte = (byte) in.read();
+} catch (IOException e) {
+  throw new ParquetDecodingException("Failed to read a byte", e);
+}
+  }
+
   @Override
   public final void readBooleans(int total, WritableColumnVector c, int rowId) 
{
-// TODO: properly vectorize this
-for (int i = 0; i < total; i++) {
-  c.putBoolean(rowId + i, readBoolean());
+int i = 0;
+if (bitOffset > 0) {
+  i = Math.min(8 - bitOffset, total);
+  c.putBooleans(rowId, i, currentByte, bitOffset);
+  bitOffset = (bitOffset + i) & 7;
+}
+for (; i + 7 < total; i += 8) {
+  updateCurrentByte();
+  c.putBooleans(rowId + i, currentByte);
+}
+if (i < total) {
+  updateCurrentByte();
+  bitOffset = total - i;
+  c.putBooleans(rowId + i, bitOffset, currentByte, 0);
 }
   }
 
   @Override
   public final void skipBooleans(int total) {
-// TODO: properly vectorize this
-for (int i = 0; i < total; i++) {
-  readBoolean();
+// using >>3 instead of /8 below since Java division rounds towards zero 
i.e. (-1)/8=0

Review comment:
   Thanks. For the record, I will try to explain here.
   
   Let's say `total=8`, `bitOffset=1`, then there are `(8-bitOffset)=7` bits to 
skip in the `currentByte`. Now there is still 1 more bit to skip as `total=8`. 
So `updateCurrentByte()` needs to be called to update the `currentByte` and  
`bitOffset` will be again `1`. In the future, the rest of the 7bits may be read 
from the updated `currentByte`. For that reason, we need to go into the if 
statement when `numBytesToSkip = (8 - (8 - 1)) >> 3 = 1>>3 = 0`.
   
   The following is a few-lines longer but equivalent condition.
   ```
   if (numBytesToSkip > 0) {
 try {
   in.skipFully(numBytesToSkip);
 } catch (IOException e) {...}
   }
   if (numBytesToSkip >= 0 && bitOffset > 0) {
   updateCurrentByte();
   }
   ```
   
   The scenario is tested at
   
https://github.com/apache/spark/pull/34611/files#diff-b84cbbb2eadfa9d267b9ab8be2e6be579f28c1813623785c9e667a864f7960e1R194
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34687: [SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series

2021-11-22 Thread GitBox



SparkQA commented on pull request #34687:
URL: https://github.com/apache/spark/pull/34687#issuecomment-976207876


   **[Test build #145537 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145537/testReport)**
 for PR 34687 at commit 
[`a643b5e`](https://github.com/apache/spark/commit/a643b5eff512a397723732c07e51a56aa5044f30).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-22 Thread GitBox



SparkQA commented on pull request #34677:
URL: https://github.com/apache/spark/pull/34677#issuecomment-976207850


   **[Test build #145538 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145538/testReport)**
 for PR 34677 at commit 
[`0b67651`](https://github.com/apache/spark/commit/0b6765150798799a418d39209b4e5f6d4a16276e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34688: [WIP][SPARK-32079][PYTHON] Remove namedtuple hack by replace built-in pickle to cloudpickle

2021-11-22 Thread GitBox



SparkQA commented on pull request #34688:
URL: https://github.com/apache/spark/pull/34688#issuecomment-976207820


   **[Test build #145536 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145536/testReport)**
 for PR 34688 at commit 
[`8962685`](https://github.com/apache/spark/commit/8962685238818b506aed70baa8d9336f7c8cc472).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34676: [SPARK-37434][BUILD] Add a new profile to auto disable unsupported UTs on MacOs using Apple Silicon

2021-11-22 Thread GitBox



AmplabJenkins removed a comment on pull request #34676:
URL: https://github.com/apache/spark/pull/34676#issuecomment-976207428


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145531/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34688: [WIP][SPARK-32079][PYTHON] Remove namedtuple hack by replace built-in pickle to cloudpickle

2021-11-22 Thread GitBox



AmplabJenkins removed a comment on pull request #34688:
URL: https://github.com/apache/spark/pull/34688#issuecomment-976207427






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34688: [WIP][SPARK-32079][PYTHON] Remove namedtuple hack by replace built-in pickle to cloudpickle

2021-11-22 Thread GitBox



AmplabJenkins commented on pull request #34688:
URL: https://github.com/apache/spark/pull/34688#issuecomment-976207427






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34676: [SPARK-37434][BUILD] Add a new profile to auto disable unsupported UTs on MacOs using Apple Silicon

2021-11-22 Thread GitBox



AmplabJenkins commented on pull request #34676:
URL: https://github.com/apache/spark/pull/34676#issuecomment-976207428


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145531/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34688: [WIP][SPARK-32079][PYTHON] Remove namedtuple hack by replace built-in pickle to cloudpickle

2021-11-22 Thread GitBox



SparkQA commented on pull request #34688:
URL: https://github.com/apache/spark/pull/34688#issuecomment-976205593


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50006/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sadikovi commented on a change in pull request #34611: [SPARK-35867][SQL] Enable vectorized read for VectorizedPlainValuesReader.readBooleans

2021-11-22 Thread GitBox



sadikovi commented on a change in pull request #34611:
URL: https://github.com/apache/spark/pull/34611#discussion_r754834288



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
##
@@ -53,19 +53,50 @@ public void skip() {
 throw new UnsupportedOperationException();
   }
 
+  private void updateCurrentByte() {
+try {
+  currentByte = (byte) in.read();
+} catch (IOException e) {
+  throw new ParquetDecodingException("Failed to read a byte", e);
+}
+  }
+
   @Override
   public final void readBooleans(int total, WritableColumnVector c, int rowId) 
{
-// TODO: properly vectorize this
-for (int i = 0; i < total; i++) {
-  c.putBoolean(rowId + i, readBoolean());
+int i = 0;
+if (bitOffset > 0) {
+  i = Math.min(8 - bitOffset, total);
+  c.putBooleans(rowId, i, currentByte, bitOffset);
+  bitOffset = (bitOffset + i) & 7;
+}
+for (; i + 7 < total; i += 8) {
+  updateCurrentByte();
+  c.putBooleans(rowId + i, currentByte);
+}
+if (i < total) {
+  updateCurrentByte();
+  bitOffset = total - i;
+  c.putBooleans(rowId + i, bitOffset, currentByte, 0);
 }
   }
 
   @Override
   public final void skipBooleans(int total) {
-// TODO: properly vectorize this
-for (int i = 0; i < total; i++) {
-  readBoolean();
+// Using >>3 instead of /8 below. The difference is important when 
(total-(8-bitOffset))<0.
+// E.g. (-1)>>3=(-1) vs. (-1)/8=0. The latter incorrectly enters the 
if(numBytesToSkip>=0){.

Review comment:
   Why do you even need to enter if numBytesToSkip is 0?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34688: [WIP][SPARK-32079][PYTHON] Remove namedtuple hack by replace built-in pickle to cloudpickle

2021-11-22 Thread GitBox



SparkQA commented on pull request #34688:
URL: https://github.com/apache/spark/pull/34688#issuecomment-976201538


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50005/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34676: [SPARK-37434][BUILD] Add a new profile to auto disable unsupported UTs on MacOs using Apple Silicon

2021-11-22 Thread GitBox



SparkQA removed a comment on pull request #34676:
URL: https://github.com/apache/spark/pull/34676#issuecomment-976149315


   **[Test build #145531 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145531/testReport)**
 for PR 34676 at commit 
[`f87467b`](https://github.com/apache/spark/commit/f87467b48d7989fdd026d0b337de617b3f4f9e6d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34688: [WIP][SPARK-32079][PYTHON] Remove namedtuple hack by replace built-in pickle to cloudpickle

2021-11-22 Thread GitBox



SparkQA removed a comment on pull request #34688:
URL: https://github.com/apache/spark/pull/34688#issuecomment-976187346


   **[Test build #145534 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145534/testReport)**
 for PR 34688 at commit 
[`7fe5438`](https://github.com/apache/spark/commit/7fe5438fc98a8419cf62af9934cccac62c57fdac).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34688: [WIP][SPARK-32079][PYTHON] Remove namedtuple hack by replace built-in pickle to cloudpickle

2021-11-22 Thread GitBox



SparkQA commented on pull request #34688:
URL: https://github.com/apache/spark/pull/34688#issuecomment-976198971


   **[Test build #145534 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145534/testReport)**
 for PR 34688 at commit 
[`7fe5438`](https://github.com/apache/spark/commit/7fe5438fc98a8419cf62af9934cccac62c57fdac).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class CloudPickleSerializer(FramedSerializer):`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34676: [SPARK-37434][BUILD] Add a new profile to auto disable unsupported UTs on MacOs using Apple Silicon

2021-11-22 Thread GitBox



SparkQA commented on pull request #34676:
URL: https://github.com/apache/spark/pull/34676#issuecomment-976198419


   **[Test build #145531 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145531/testReport)**
 for PR 34676 at commit 
[`f87467b`](https://github.com/apache/spark/commit/f87467b48d7989fdd026d0b337de617b3f4f9e6d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sadikovi commented on a change in pull request #34611: [SPARK-35867][SQL] Enable vectorized read for VectorizedPlainValuesReader.readBooleans

2021-11-22 Thread GitBox



sadikovi commented on a change in pull request #34611:
URL: https://github.com/apache/spark/pull/34611#discussion_r754830520



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
##
@@ -53,19 +53,47 @@ public void skip() {
 throw new UnsupportedOperationException();
   }
 
+  private void updateCurrentByte() {
+try {
+  currentByte = (byte) in.read();
+} catch (IOException e) {
+  throw new ParquetDecodingException("Failed to read a byte", e);
+}
+  }
+
   @Override
   public final void readBooleans(int total, WritableColumnVector c, int rowId) 
{
-// TODO: properly vectorize this
-for (int i = 0; i < total; i++) {
-  c.putBoolean(rowId + i, readBoolean());
+int i = 0;
+if (bitOffset > 0) {
+  i = Math.min(8 - bitOffset, total);

Review comment:
   Does the code work for total = 9 and bitOffset = 0?

##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
##
@@ -53,19 +53,47 @@ public void skip() {
 throw new UnsupportedOperationException();
   }
 
+  private void updateCurrentByte() {
+try {
+  currentByte = (byte) in.read();
+} catch (IOException e) {
+  throw new ParquetDecodingException("Failed to read a byte", e);
+}
+  }
+
   @Override
   public final void readBooleans(int total, WritableColumnVector c, int rowId) 
{
-// TODO: properly vectorize this
-for (int i = 0; i < total; i++) {
-  c.putBoolean(rowId + i, readBoolean());
+int i = 0;
+if (bitOffset > 0) {
+  i = Math.min(8 - bitOffset, total);

Review comment:
   Does the code work for total = 9 and bitOffset = 0? Can you add a test 
case for this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34688: [WIP][SPARK-32079][PYTHON] Remove namedtuple hack by replace built-in pickle to cloudpickle

2021-11-22 Thread GitBox



SparkQA removed a comment on pull request #34688:
URL: https://github.com/apache/spark/pull/34688#issuecomment-976186208


   **[Test build #145533 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145533/testReport)**
 for PR 34688 at commit 
[`a7c71a2`](https://github.com/apache/spark/commit/a7c71a28aed1e9463df7563fb8a590675a6d8417).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34688: [WIP][SPARK-32079][PYTHON] Remove namedtuple hack by replace built-in pickle to cloudpickle

2021-11-22 Thread GitBox



SparkQA commented on pull request #34688:
URL: https://github.com/apache/spark/pull/34688#issuecomment-976196451


   **[Test build #145533 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145533/testReport)**
 for PR 34688 at commit 
[`a7c71a2`](https://github.com/apache/spark/commit/a7c71a28aed1e9463df7563fb8a590675a6d8417).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class CloudPickleSerializer(FramedSerializer):`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] kazuyukitanimura commented on a change in pull request #34611: [SPARK-35867][SQL] Enable vectorized read for VectorizedPlainValuesReader.readBooleans

2021-11-22 Thread GitBox



kazuyukitanimura commented on a change in pull request #34611:
URL: https://github.com/apache/spark/pull/34611#discussion_r754828616



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
##
@@ -53,19 +53,47 @@ public void skip() {
 throw new UnsupportedOperationException();
   }
 
+  private void updateCurrentByte() {
+try {
+  currentByte = (byte) in.read();
+} catch (IOException e) {
+  throw new ParquetDecodingException("Failed to read a byte", e);
+}
+  }
+
   @Override
   public final void readBooleans(int total, WritableColumnVector c, int rowId) 
{
-// TODO: properly vectorize this
-for (int i = 0; i < total; i++) {
-  c.putBoolean(rowId + i, readBoolean());
+int i = 0;
+if (bitOffset > 0) {
+  i = Math.min(8 - bitOffset, total);

Review comment:
   `readBooleans()` can be called multiple times. The scenario is tested in 
`ColumnarBatchSuite.scala`.
   
   `readBooleans()` also calls `updateCurrentByte()` right before using it.
   ```
   public final void readBooleans(...
   if (bitOffset > 0) { // means there are still bits to be read in 
currentByte
   ...
   for (; i + 7 < total; i += 8) {
 updateCurrentByte(); // calling it here
   ...
   if (i < total) {
 updateCurrentByte(); // calling it here
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sadikovi commented on a change in pull request #34611: [SPARK-35867][SQL] Enable vectorized read for VectorizedPlainValuesReader.readBooleans

2021-11-22 Thread GitBox



sadikovi commented on a change in pull request #34611:
URL: https://github.com/apache/spark/pull/34611#discussion_r754826874



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
##
@@ -53,19 +53,47 @@ public void skip() {
 throw new UnsupportedOperationException();
   }
 
+  private void updateCurrentByte() {
+try {
+  currentByte = (byte) in.read();
+} catch (IOException e) {
+  throw new ParquetDecodingException("Failed to read a byte", e);
+}
+  }
+
   @Override
   public final void readBooleans(int total, WritableColumnVector c, int rowId) 
{
-// TODO: properly vectorize this
-for (int i = 0; i < total; i++) {
-  c.putBoolean(rowId + i, readBoolean());
+int i = 0;
+if (bitOffset > 0) {
+  i = Math.min(8 - bitOffset, total);
+  c.putBooleans(rowId, i, currentByte, bitOffset);
+  bitOffset = (bitOffset + i) & 7;
+}
+for (; i + 7 < total; i += 8) {
+  updateCurrentByte();
+  c.putBooleans(rowId + i, currentByte);
+}
+if (i < total) {
+  updateCurrentByte();
+  bitOffset = total - i;
+  c.putBooleans(rowId + i, bitOffset, currentByte, 0);
 }
   }
 
   @Override
   public final void skipBooleans(int total) {
-// TODO: properly vectorize this
-for (int i = 0; i < total; i++) {
-  readBoolean();
+// using >>3 instead of /8 below since Java division rounds towards zero 
i.e. (-1)/8=0

Review comment:
   My concern is that the testing coverage is low and we could introduce 
subtle bugs that could be difficult to debug. I guess it is fine to keep as is.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sadikovi commented on a change in pull request #34611: [SPARK-35867][SQL] Enable vectorized read for VectorizedPlainValuesReader.readBooleans

2021-11-22 Thread GitBox



sadikovi commented on a change in pull request #34611:
URL: https://github.com/apache/spark/pull/34611#discussion_r754826446



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
##
@@ -53,19 +53,47 @@ public void skip() {
 throw new UnsupportedOperationException();
   }
 
+  private void updateCurrentByte() {
+try {
+  currentByte = (byte) in.read();
+} catch (IOException e) {
+  throw new ParquetDecodingException("Failed to read a byte", e);
+}
+  }
+
   @Override
   public final void readBooleans(int total, WritableColumnVector c, int rowId) 
{
-// TODO: properly vectorize this
-for (int i = 0; i < total; i++) {
-  c.putBoolean(rowId + i, readBoolean());
+int i = 0;
+if (bitOffset > 0) {
+  i = Math.min(8 - bitOffset, total);
+  c.putBooleans(rowId, i, currentByte, bitOffset);
+  bitOffset = (bitOffset + i) & 7;
+}
+for (; i + 7 < total; i += 8) {
+  updateCurrentByte();
+  c.putBooleans(rowId + i, currentByte);
+}
+if (i < total) {
+  updateCurrentByte();
+  bitOffset = total - i;
+  c.putBooleans(rowId + i, bitOffset, currentByte, 0);
 }
   }
 
   @Override
   public final void skipBooleans(int total) {
-// TODO: properly vectorize this
-for (int i = 0; i < total; i++) {
-  readBoolean();
+// using >>3 instead of /8 below since Java division rounds towards zero 
i.e. (-1)/8=0

Review comment:
   Hmm.. Why do you even need to go into the if statement if numBytesToSkip 
is 0?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34611: [SPARK-35867][SQL] Enable vectorized read for VectorizedPlainValuesReader.readBooleans

2021-11-22 Thread GitBox



SparkQA commented on pull request #34611:
URL: https://github.com/apache/spark/pull/34611#issuecomment-976191725


   **[Test build #145535 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145535/testReport)**
 for PR 34611 at commit 
[`f5327bc`](https://github.com/apache/spark/commit/f5327bc8ba14e8f00f3a296889f4d65792848f68).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-22 Thread GitBox



HyukjinKwon commented on pull request #34677:
URL: https://github.com/apache/spark/pull/34677#issuecomment-976191464


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang closed pull request #34681: [SPARK-37438][SQL] ANSI mode: Use store assignment rules for resolving function invocation

2021-11-22 Thread GitBox



gengliangwang closed pull request #34681:
URL: https://github.com/apache/spark/pull/34681


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on pull request #34681: [SPARK-37438][SQL] ANSI mode: Use store assignment rules for resolving function invocation

2021-11-22 Thread GitBox



gengliangwang commented on pull request #34681:
URL: https://github.com/apache/spark/pull/34681#issuecomment-976191032


   Merging to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] kazuyukitanimura commented on a change in pull request #34611: [SPARK-35867][SQL] Enable vectorized read for VectorizedPlainValuesReader.readBooleans

2021-11-22 Thread GitBox



kazuyukitanimura commented on a change in pull request #34611:
URL: https://github.com/apache/spark/pull/34611#discussion_r754823543



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
##
@@ -53,19 +53,47 @@ public void skip() {
 throw new UnsupportedOperationException();
   }
 
+  private void updateCurrentByte() {
+try {
+  currentByte = (byte) in.read();
+} catch (IOException e) {
+  throw new ParquetDecodingException("Failed to read a byte", e);
+}
+  }
+
   @Override
   public final void readBooleans(int total, WritableColumnVector c, int rowId) 
{
-// TODO: properly vectorize this
-for (int i = 0; i < total; i++) {
-  c.putBoolean(rowId + i, readBoolean());
+int i = 0;
+if (bitOffset > 0) {
+  i = Math.min(8 - bitOffset, total);
+  c.putBooleans(rowId, i, currentByte, bitOffset);
+  bitOffset = (bitOffset + i) & 7;
+}
+for (; i + 7 < total; i += 8) {
+  updateCurrentByte();
+  c.putBooleans(rowId + i, currentByte);
+}
+if (i < total) {
+  updateCurrentByte();
+  bitOffset = total - i;
+  c.putBooleans(rowId + i, bitOffset, currentByte, 0);
 }
   }
 
   @Override
   public final void skipBooleans(int total) {
-// TODO: properly vectorize this
-for (int i = 0; i < total; i++) {
-  readBoolean();
+// using >>3 instead of /8 below since Java division rounds towards zero 
i.e. (-1)/8=0

Review comment:
   Oh I see. The difference is important when `(total-(8-bitOffset))<0.` 
E.g. `(-1)>>3=(-1)` vs. `(-1)/8=0`. The latter incorrectly enters the 
`if(numBytesToSkip>=0){`. I updated the comment, hopefully it is clearer now.
   
   We should not call
   ```
   if (bitOffset > 0) {
 updateCurrentByte();
   }
   ```
   outside of the `if (numBytesToSkip >= 0) {` clause. That is because 
`numBytesToSkip<0` <=> `(total-(8-bitOffset))<0` means there will be still 
unread bits left in `currentByte`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34687: [SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series

2021-11-22 Thread GitBox



AmplabJenkins removed a comment on pull request #34687:
URL: https://github.com/apache/spark/pull/34687#issuecomment-976188106


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50004/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34687: [SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series

2021-11-22 Thread GitBox



AmplabJenkins commented on pull request #34687:
URL: https://github.com/apache/spark/pull/34687#issuecomment-976188106


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50004/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34687: [SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series

2021-11-22 Thread GitBox



SparkQA commented on pull request #34687:
URL: https://github.com/apache/spark/pull/34687#issuecomment-976188085


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50004/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34688: [WIP][SPARK-32079][PYTHON] Remove namedtuple hack by replace built-in pickle to cloudpickle

2021-11-22 Thread GitBox



SparkQA commented on pull request #34688:
URL: https://github.com/apache/spark/pull/34688#issuecomment-976187346


   **[Test build #145534 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145534/testReport)**
 for PR 34688 at commit 
[`7fe5438`](https://github.com/apache/spark/commit/7fe5438fc98a8419cf62af9934cccac62c57fdac).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-22 Thread GitBox



AmplabJenkins removed a comment on pull request #34677:
URL: https://github.com/apache/spark/pull/34677#issuecomment-976186987


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145530/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-22 Thread GitBox



AmplabJenkins commented on pull request #34677:
URL: https://github.com/apache/spark/pull/34677#issuecomment-976186987


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145530/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-22 Thread GitBox



SparkQA removed a comment on pull request #34677:
URL: https://github.com/apache/spark/pull/34677#issuecomment-976149296


   **[Test build #145530 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145530/testReport)**
 for PR 34677 at commit 
[`0b67651`](https://github.com/apache/spark/commit/0b6765150798799a418d39209b4e5f6d4a16276e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-22 Thread GitBox



SparkQA commented on pull request #34677:
URL: https://github.com/apache/spark/pull/34677#issuecomment-976186594


   **[Test build #145530 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145530/testReport)**
 for PR 34677 at commit 
[`0b67651`](https://github.com/apache/spark/commit/0b6765150798799a418d39209b4e5f6d4a16276e).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class SQLStringFormatter(string.Formatter):`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34688: [WIP][SPARK-32079][PYTHON] Remove namedtuple hack by replace built-in pickle to cloudpickle

2021-11-22 Thread GitBox



SparkQA commented on pull request #34688:
URL: https://github.com/apache/spark/pull/34688#issuecomment-976186208


   **[Test build #145533 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145533/testReport)**
 for PR 34688 at commit 
[`a7c71a2`](https://github.com/apache/spark/commit/a7c71a28aed1e9463df7563fb8a590675a6d8417).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34687: [SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series

2021-11-22 Thread GitBox



AmplabJenkins removed a comment on pull request #34687:
URL: https://github.com/apache/spark/pull/34687#issuecomment-976185595


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50001/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-22 Thread GitBox



AmplabJenkins removed a comment on pull request #34677:
URL: https://github.com/apache/spark/pull/34677#issuecomment-976185596


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50002/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34676: [SPARK-37434][BUILD] Add a new profile to auto disable unsupported UTs on MacOs using Apple Silicon

2021-11-22 Thread GitBox



AmplabJenkins removed a comment on pull request #34676:
URL: https://github.com/apache/spark/pull/34676#issuecomment-976185598


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50003/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34676: [SPARK-37434][BUILD] Add a new profile to auto disable unsupported UTs on MacOs using Apple Silicon

2021-11-22 Thread GitBox



AmplabJenkins commented on pull request #34676:
URL: https://github.com/apache/spark/pull/34676#issuecomment-976185598


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50003/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-22 Thread GitBox



AmplabJenkins commented on pull request #34677:
URL: https://github.com/apache/spark/pull/34677#issuecomment-976185596


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50002/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34687: [SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series

2021-11-22 Thread GitBox



AmplabJenkins commented on pull request #34687:
URL: https://github.com/apache/spark/pull/34687#issuecomment-976185595


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50001/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon opened a new pull request #34688: [WIP][SPARK-32079][PYTHON] Remove namedtuple hack by replace built-in pickle to cloudpickle

2021-11-22 Thread GitBox



HyukjinKwon opened a new pull request #34688:
URL: https://github.com/apache/spark/pull/34688


   ### What changes were proposed in this pull request?
   
   This PR proposes to replace Python's built-in CPickle to CPickle-based 
cloudpickle (requires Python 3.8+).
   For Python 3.7 and below, it still uses the legacy built-in CPickle for the 
performance matter.
   
   ### Why are the changes needed?
   
   To remove named tuple hack for the issues such as: SPARK-32079,  SPARK-22674 
and SPARK-27810.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing test cases should cover all test cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-22 Thread GitBox



SparkQA commented on pull request #34677:
URL: https://github.com/apache/spark/pull/34677#issuecomment-976179811


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50002/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34687: [SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series

2021-11-22 Thread GitBox



SparkQA commented on pull request #34687:
URL: https://github.com/apache/spark/pull/34687#issuecomment-976178977


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50001/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34676: [SPARK-37434][BUILD] Add a new profile to auto disable unsupported UTs on MacOs using Apple Silicon

2021-11-22 Thread GitBox



SparkQA commented on pull request #34676:
URL: https://github.com/apache/spark/pull/34676#issuecomment-976177156


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50003/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34685: [SPARK-37443][PYTHON] Provide a profiler for Python/Pandas UDFs

2021-11-22 Thread GitBox



AmplabJenkins removed a comment on pull request #34685:
URL: https://github.com/apache/spark/pull/34685#issuecomment-976172532


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145526/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34685: [SPARK-37443][PYTHON] Provide a profiler for Python/Pandas UDFs

2021-11-22 Thread GitBox



AmplabJenkins commented on pull request #34685:
URL: https://github.com/apache/spark/pull/34685#issuecomment-976172532


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145526/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34685: [SPARK-37443][PYTHON] Provide a profiler for Python/Pandas UDFs

2021-11-22 Thread GitBox



SparkQA removed a comment on pull request #34685:
URL: https://github.com/apache/spark/pull/34685#issuecomment-976116444


   **[Test build #145526 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145526/testReport)**
 for PR 34685 at commit 
[`b9ede68`](https://github.com/apache/spark/commit/b9ede68a8d4d1a49164b2f887ce619c2166cf504).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34685: [SPARK-37443][PYTHON] Provide a profiler for Python/Pandas UDFs

2021-11-22 Thread GitBox



SparkQA commented on pull request #34685:
URL: https://github.com/apache/spark/pull/34685#issuecomment-976172067


   **[Test build #145526 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145526/testReport)**
 for PR 34685 at commit 
[`b9ede68`](https://github.com/apache/spark/commit/b9ede68a8d4d1a49164b2f887ce619c2166cf504).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34681: [SPARK-37438][SQL] ANSI mode: Use store assignment rules for resolving function invocation

2021-11-22 Thread GitBox



AmplabJenkins removed a comment on pull request #34681:
URL: https://github.com/apache/spark/pull/34681#issuecomment-976171560


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/5/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34681: [SPARK-37438][SQL] ANSI mode: Use store assignment rules for resolving function invocation

2021-11-22 Thread GitBox



SparkQA commented on pull request #34681:
URL: https://github.com/apache/spark/pull/34681#issuecomment-976171544


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/5/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34681: [SPARK-37438][SQL] ANSI mode: Use store assignment rules for resolving function invocation

2021-11-22 Thread GitBox



AmplabJenkins commented on pull request #34681:
URL: https://github.com/apache/spark/pull/34681#issuecomment-976171560


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/5/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34687: [SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series

2021-11-22 Thread GitBox



SparkQA commented on pull request #34687:
URL: https://github.com/apache/spark/pull/34687#issuecomment-976169810


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50004/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sadikovi commented on a change in pull request #34611: [SPARK-35867][SQL] Enable vectorized read for VectorizedPlainValuesReader.readBooleans

2021-11-22 Thread GitBox



sadikovi commented on a change in pull request #34611:
URL: https://github.com/apache/spark/pull/34611#discussion_r754804617



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
##
@@ -53,19 +53,47 @@ public void skip() {
 throw new UnsupportedOperationException();
   }
 
+  private void updateCurrentByte() {
+try {
+  currentByte = (byte) in.read();
+} catch (IOException e) {
+  throw new ParquetDecodingException("Failed to read a byte", e);
+}
+  }
+
   @Override
   public final void readBooleans(int total, WritableColumnVector c, int rowId) 
{
-// TODO: properly vectorize this
-for (int i = 0; i < total; i++) {
-  c.putBoolean(rowId + i, readBoolean());
+int i = 0;
+if (bitOffset > 0) {
+  i = Math.min(8 - bitOffset, total);

Review comment:
   Yes, I understand but what about `readBooleans()` method? Does it mean 
that I can only call readBooleans() once? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] kazuyukitanimura commented on a change in pull request #34611: [SPARK-35867][SQL] Enable vectorized read for VectorizedPlainValuesReader.readBooleans

2021-11-22 Thread GitBox



kazuyukitanimura commented on a change in pull request #34611:
URL: https://github.com/apache/spark/pull/34611#discussion_r754803738



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
##
@@ -53,19 +53,47 @@ public void skip() {
 throw new UnsupportedOperationException();
   }
 
+  private void updateCurrentByte() {
+try {
+  currentByte = (byte) in.read();
+} catch (IOException e) {
+  throw new ParquetDecodingException("Failed to read a byte", e);
+}
+  }
+
   @Override
   public final void readBooleans(int total, WritableColumnVector c, int rowId) 
{
-// TODO: properly vectorize this
-for (int i = 0; i < total; i++) {
-  c.putBoolean(rowId + i, readBoolean());
+int i = 0;
+if (bitOffset > 0) {
+  i = Math.min(8 - bitOffset, total);

Review comment:
   We do not need to call `updateCurrentByte()` here when `total == 8 - 
bitOffset`. If you scroll down to the one bit version of the reader in the same 
file, you will find 
   ```
 public final boolean readBoolean() {
   if (bitOffset == 0) {
 updateCurrentByte();
   }
   ```
   It calls `updateCurrentByte` right before reading. So it is always next 
reader's responsibility to call the method.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34685: [SPARK-37443][PYTHON] Provide a profiler for Python/Pandas UDFs

2021-11-22 Thread GitBox



AmplabJenkins removed a comment on pull request #34685:
URL: https://github.com/apache/spark/pull/34685#issuecomment-976168219


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49998/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34686: [SPARK-37444][SQL] ALTER NAMESPACE ... SET LOCATION should handle empty location consistently across v1 and v2 command

2021-11-22 Thread GitBox



AmplabJenkins removed a comment on pull request #34686:
URL: https://github.com/apache/spark/pull/34686#issuecomment-976168221


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49997/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-22 Thread GitBox



AmplabJenkins removed a comment on pull request #34677:
URL: https://github.com/apache/spark/pull/34677#issuecomment-976168174


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/4/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34687: [SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series

2021-11-22 Thread GitBox



AmplabJenkins removed a comment on pull request #34687:
URL: https://github.com/apache/spark/pull/34687#issuecomment-976168217






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34686: [SPARK-37444][SQL] ALTER NAMESPACE ... SET LOCATION should handle empty location consistently across v1 and v2 command

2021-11-22 Thread GitBox



AmplabJenkins commented on pull request #34686:
URL: https://github.com/apache/spark/pull/34686#issuecomment-976168221


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49997/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34685: [SPARK-37443][PYTHON] Provide a profiler for Python/Pandas UDFs

2021-11-22 Thread GitBox



AmplabJenkins commented on pull request #34685:
URL: https://github.com/apache/spark/pull/34685#issuecomment-976168219


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49998/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34687: [SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series

2021-11-22 Thread GitBox



AmplabJenkins commented on pull request #34687:
URL: https://github.com/apache/spark/pull/34687#issuecomment-976168218






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-22 Thread GitBox



AmplabJenkins commented on pull request #34677:
URL: https://github.com/apache/spark/pull/34677#issuecomment-976168174


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/4/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-22 Thread GitBox



SparkQA commented on pull request #34677:
URL: https://github.com/apache/spark/pull/34677#issuecomment-976168126


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/4/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 510 matches

Mail list logo