[jira] [Resolved] (SPARK-46337) Make `CTESubstitution` retain the PLAN_ID_TAG
[ https://issues.apache.org/jira/browse/SPARK-46337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46337. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44268 [https://github.com/apache/spark/pull/44268] > Make `CTESubstitution` retain the PLAN_ID_TAG > -- > > Key: SPARK-46337 > URL: https://issues.apache.org/jira/browse/SPARK-46337 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46337) Make `CTESubstitution` retain the PLAN_ID_TAG
[ https://issues.apache.org/jira/browse/SPARK-46337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46337: Assignee: Ruifeng Zheng > Make `CTESubstitution` retain the PLAN_ID_TAG > -- > > Key: SPARK-46337 > URL: https://issues.apache.org/jira/browse/SPARK-46337 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46339) Directory with number name should not be treated as metadata log
[ https://issues.apache.org/jira/browse/SPARK-46339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh reassigned SPARK-46339: --- Assignee: L. C. Hsieh > Directory with number name should not be treated as metadata log > > > Key: SPARK-46339 > URL: https://issues.apache.org/jira/browse/SPARK-46339 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.3.3, 3.4.2, 3.5.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > > HDFSMetadataLog takes a metadata path as parameter. When it goes to retrieves > all batches metadata, it calls `CheckpointFileManager.list` to get all files > under the metadata path. However, currently all implementations of > `CheckpointFileManager.list` returns all files/directories under the given > path. So if there is a dictionary with name of batch number (a long value), > the directory will be returned too and cause trouble when HDFSMetadataLog > goes to read it. > Actually, `CheckpointFileManager.list` method clearly defines that it lists > the "files" in a path. That's being said, current implementations don't > follow the doc. We should fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46339) Directory with number name should not be treated as metadata log
[ https://issues.apache.org/jira/browse/SPARK-46339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46339: --- Labels: pull-request-available (was: ) > Directory with number name should not be treated as metadata log > > > Key: SPARK-46339 > URL: https://issues.apache.org/jira/browse/SPARK-46339 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.3.3, 3.4.2, 3.5.0 >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > > HDFSMetadataLog takes a metadata path as parameter. When it goes to retrieves > all batches metadata, it calls `CheckpointFileManager.list` to get all files > under the metadata path. However, currently all implementations of > `CheckpointFileManager.list` returns all files/directories under the given > path. So if there is a dictionary with name of batch number (a long value), > the directory will be returned too and cause trouble when HDFSMetadataLog > goes to read it. > Actually, `CheckpointFileManager.list` method clearly defines that it lists > the "files" in a path. That's being said, current implementations don't > follow the doc. We should fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46339) Directory with number name should not be treated as metadata log
L. C. Hsieh created SPARK-46339: --- Summary: Directory with number name should not be treated as metadata log Key: SPARK-46339 URL: https://issues.apache.org/jira/browse/SPARK-46339 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 3.5.0, 3.4.2, 3.3.3 Reporter: L. C. Hsieh HDFSMetadataLog takes a metadata path as parameter. When it goes to retrieves all batches metadata, it calls `CheckpointFileManager.list` to get all files under the metadata path. However, currently all implementations of `CheckpointFileManager.list` returns all files/directories under the given path. So if there is a dictionary with name of batch number (a long value), the directory will be returned too and cause trouble when HDFSMetadataLog goes to read it. Actually, `CheckpointFileManager.list` method clearly defines that it lists the "files" in a path. That's being said, current implementations don't follow the doc. We should fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46338) Re-enable the get_item test for BasicIndexingTests
[ https://issues.apache.org/jira/browse/SPARK-46338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46338: --- Labels: pull-request-available (was: ) > Re-enable the get_item test for BasicIndexingTests > -- > > Key: SPARK-46338 > URL: https://issues.apache.org/jira/browse/SPARK-46338 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark, Tests >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46338) Re-enable the get_item test for BasicIndexingTests
Haejoon Lee created SPARK-46338: --- Summary: Re-enable the get_item test for BasicIndexingTests Key: SPARK-46338 URL: https://issues.apache.org/jira/browse/SPARK-46338 Project: Spark Issue Type: Bug Components: Pandas API on Spark, Tests Affects Versions: 4.0.0 Reporter: Haejoon Lee -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46337) Make `CTESubstitution` retain the PLAN_ID_TAG
[ https://issues.apache.org/jira/browse/SPARK-46337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46337: --- Labels: pull-request-available (was: ) > Make `CTESubstitution` retain the PLAN_ID_TAG > -- > > Key: SPARK-46337 > URL: https://issues.apache.org/jira/browse/SPARK-46337 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46337) Make `CTESubstitution` retain the PLAN_ID_TAG
Ruifeng Zheng created SPARK-46337: - Summary: Make `CTESubstitution` retain the PLAN_ID_TAG Key: SPARK-46337 URL: https://issues.apache.org/jira/browse/SPARK-46337 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46334) Update pandas to 2.1.4
[ https://issues.apache.org/jira/browse/SPARK-46334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46334. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44266 [https://github.com/apache/spark/pull/44266] > Update pandas to 2.1.4 > -- > > Key: SPARK-46334 > URL: https://issues.apache.org/jira/browse/SPARK-46334 > Project: Spark > Issue Type: Dependency upgrade > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46335) Upgrade Maven to 3.9.6 for MNG-7913
[ https://issues.apache.org/jira/browse/SPARK-46335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-46335. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44267 [https://github.com/apache/spark/pull/44267] > Upgrade Maven to 3.9.6 for MNG-7913 > --- > > Key: SPARK-46335 > URL: https://issues.apache.org/jira/browse/SPARK-46335 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46336) Cancel zombie tasks when a ShuffleMapStage is finished
[ https://issues.apache.org/jira/browse/SPARK-46336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794888#comment-17794888 ] Xingbo Jiang commented on SPARK-46336: -- I'm currently working on this. > Cancel zombie tasks when a ShuffleMapStage is finished > -- > > Key: SPARK-46336 > URL: https://issues.apache.org/jira/browse/SPARK-46336 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Xingbo Jiang >Assignee: Xingbo Jiang >Priority: Major > > Spark cancels zombie or speculative tasks when a job is finished. We should > also cancel the running tasks when a ShuffleMapStage is finished, because we > won’t be using the result of those tasks any more. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46336) Cancel zombie tasks when a ShuffleMapStage is finished
Xingbo Jiang created SPARK-46336: Summary: Cancel zombie tasks when a ShuffleMapStage is finished Key: SPARK-46336 URL: https://issues.apache.org/jira/browse/SPARK-46336 Project: Spark Issue Type: Task Components: Spark Core Affects Versions: 3.5.0 Reporter: Xingbo Jiang Assignee: Xingbo Jiang Spark cancels zombie or speculative tasks when a job is finished. We should also cancel the running tasks when a ShuffleMapStage is finished, because we won’t be using the result of those tasks any more. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42789) Rewrite multiple GetJsonObjects to a JsonTuple if their json expression is the same
[ https://issues.apache.org/jira/browse/SPARK-42789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-42789: --- Labels: pull-request-available (was: ) > Rewrite multiple GetJsonObjects to a JsonTuple if their json expression is > the same > --- > > Key: SPARK-42789 > URL: https://issues.apache.org/jira/browse/SPARK-42789 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Priority: Major > Labels: pull-request-available > > Benchmark result: > {noformat} > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 2 > Stopped after 2 iterations, 77193 ms > Running case: Rewrite: 2 > Stopped after 2 iterations, 51699 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > Default: 237914 38597 > 966 0.25244.0 1.0X > Rewrite: 224887 25850 > 1361 0.33442.2 1.5X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 3 > Stopped after 2 iterations, 110890 ms > Running case: Rewrite: 3 > Stopped after 2 iterations, 56102 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > Default: 352862 55445 > NaN 0.17311.6 1.0X > Rewrite: 326752 28051 > 1837 0.33700.2 2.0X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 4 > Stopped after 2 iterations, 150828 ms > Running case: Rewrite: 4 > Stopped after 2 iterations, 57110 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > Default: 471680 75414 > NaN 0.19914.4 1.0X > Rewrite: 428452 28555 > 145 0.33935.4 2.5X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 5 > Stopped after 2 iterations, 223367 ms > Running case: Rewrite: 5 > Stopped after 2 iterations, 78193 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > Default: 5 108479 111684 > 1447 0.1 15004.2 1.0X > Rewrite: 536830 39097 > NaN 0.25094.0 2.9X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 10 > Stopped after 2 iterations, 311453 ms > Running case: Rewrite: 10 > Stopped after 2 iterations, 65873 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > Default: 10 153952 155727 > 2510 0.0 21293.7 1.0X > Rewrite: 10 32436 32937 > 708 0.24486.3 4.7X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 15 > Stopped after 2 iterations, 451911 ms > Running
[jira] [Updated] (SPARK-44998) No need to retry parsing event log path again when FileNotFoundException occurs
[ https://issues.apache.org/jira/browse/SPARK-44998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44998: --- Labels: pull-request-available (was: ) > No need to retry parsing event log path again when FileNotFoundException > occurs > --- > > Key: SPARK-44998 > URL: https://issues.apache.org/jira/browse/SPARK-44998 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: Zhen Wang >Priority: Minor > Labels: pull-request-available > Attachments: image-2023-08-29-10-47-08-027.png, > image-2023-08-29-10-47-43-567.png > > > I found a lot of retry parsing inprogress event log records in history server > log. The application is already done while parsing, so we don't need to retry > parsing it again when FileNotFoundException occurs. > > !image-2023-08-29-10-47-08-027.png! > !image-2023-08-29-10-47-43-567.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41006) ConfigMap has the same name when launching two pods on the same namespace
[ https://issues.apache.org/jira/browse/SPARK-41006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-41006: --- Labels: pull-request-available (was: ) > ConfigMap has the same name when launching two pods on the same namespace > - > > Key: SPARK-41006 > URL: https://issues.apache.org/jira/browse/SPARK-41006 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Eric >Priority: Minor > Labels: pull-request-available > > If we use the Spark Launcher to launch our spark apps in k8s: > {code:java} > val sparkLauncher = new InProcessLauncher() > .setMaster(k8sMaster) > .setDeployMode(deployMode) > .setAppName(appName) > .setVerbose(true) > sparkLauncher.startApplication(new SparkAppHandle.Listener { ...{code} > We have an issue when we launch another spark driver in the same namespace > where other spark app was running: > {code:java} > kp -n audit-exporter-eee5073aac -w > NAME READY STATUS RESTARTS AGE > audit-exporter-71489e843d8085c0-driver 1/1 Running 0 > 9m54s > audit-exporter-7e6b8b843d80b9e6-exec-1 1/1 Running 0 > 9m40s > data-io-120204843d899567-driver 0/1 Terminating 0 1s > data-io-120204843d899567-driver 0/1 Terminating 0 2s > data-io-120204843d899567-driver 0/1 Terminating 0 3s > data-io-120204843d899567-driver 0/1 Terminating 0 > 3s{code} > The error is: > {code:java} > {"time":"2022-11-03T12:49:45.626Z","lvl":"WARN","logger":"o.a.s.l.InProcessAppHandle","thread":"spark-app-38: > 'data-io'","msg":"Application failed with > exception.","stack_trace":"io.fabric8.kubernetes.client.KubernetesClientException: > Failure executing: PUT at: > https://kubernetes.default/api/v1/namespaces/audit-exporter-eee5073aac/configmaps/spark-drv-d19c37843d80350c-conf-map. > Message: ConfigMap \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: > Forbidden: field is immutable when `immutable` is set. Received status: > Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=data, message=Forbidden: > field is immutable when `immutable` is set, reason=FieldValueForbidden, > additionalProperties={})], group=null, kind=ConfigMap, > name=spark-drv-d19c37843d80350c-conf-map, retryAfterSeconds=null, uid=null, > additionalProperties={}), kind=Status, message=ConfigMap > \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: Forbidden: field is > immutable when `immutable` is set, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}).\n\tat > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:342)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:322)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:649)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:195)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation$$Lambda$5360/00.apply(Unknown > Source)\n\tat > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:200)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:141)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation$$Lambda$4618/00.apply(Unknown > Source)\n\tat > io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.replace(CreateOrReplaceHelper.java:69)\n\tat > > io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.createOrReplace(CreateOrReplaceHelper.java:61)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:318)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:83)\n\tat > >
[jira] [Updated] (SPARK-46335) Upgrade Maven to 3.9.6 for MNG-7913
[ https://issues.apache.org/jira/browse/SPARK-46335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46335: --- Labels: pull-request-available (was: ) > Upgrade Maven to 3.9.6 for MNG-7913 > --- > > Key: SPARK-46335 > URL: https://issues.apache.org/jira/browse/SPARK-46335 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46325) Remove unnecessary override functions when constructing `WrappedCloseableIterator` in `ResponseValidator#wrapIterator`
[ https://issues.apache.org/jira/browse/SPARK-46325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46325. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44255 [https://github.com/apache/spark/pull/44255] > Remove unnecessary override functions when constructing > `WrappedCloseableIterator` in `ResponseValidator#wrapIterator` > -- > > Key: SPARK-46325 > URL: https://issues.apache.org/jira/browse/SPARK-46325 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Should reuse functions defined in {{WrappedCloseableIterator}} instead of > overriding them > > *ResponseValidator#wrapIterator* > > {code:java} > def wrapIterator[T <: GeneratedMessageV3, V <: CloseableIterator[T]]( > inner: V): WrappedCloseableIterator[T] = { > new WrappedCloseableIterator[T] { > override def innerIterator: Iterator[T] = inner > override def hasNext: Boolean = { > innerIterator.hasNext > } > override def next(): T = { > verifyResponse { > innerIterator.next() > } > } > override def close(): Unit = { > innerIterator match { > case it: CloseableIterator[T] => it.close() > case _ => // nothing > } > } > } > } {code} > *WrappedCloseableIterator* > > {code:java} > private[sql] abstract class WrappedCloseableIterator[E] extends > CloseableIterator[E] { > def innerIterator: Iterator[E] > override def next(): E = innerIterator.next() > override def hasNext: Boolean = innerIterator.hasNext > override def close(): Unit = innerIterator match { > case it: CloseableIterator[E] => it.close() > case _ => // nothing > } > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46325) Remove unnecessary override functions when constructing `WrappedCloseableIterator` in `ResponseValidator#wrapIterator`
[ https://issues.apache.org/jira/browse/SPARK-46325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46325: - Assignee: Yang Jie > Remove unnecessary override functions when constructing > `WrappedCloseableIterator` in `ResponseValidator#wrapIterator` > -- > > Key: SPARK-46325 > URL: https://issues.apache.org/jira/browse/SPARK-46325 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > > Should reuse functions defined in {{WrappedCloseableIterator}} instead of > overriding them > > *ResponseValidator#wrapIterator* > > {code:java} > def wrapIterator[T <: GeneratedMessageV3, V <: CloseableIterator[T]]( > inner: V): WrappedCloseableIterator[T] = { > new WrappedCloseableIterator[T] { > override def innerIterator: Iterator[T] = inner > override def hasNext: Boolean = { > innerIterator.hasNext > } > override def next(): T = { > verifyResponse { > innerIterator.next() > } > } > override def close(): Unit = { > innerIterator match { > case it: CloseableIterator[T] => it.close() > case _ => // nothing > } > } > } > } {code} > *WrappedCloseableIterator* > > {code:java} > private[sql] abstract class WrappedCloseableIterator[E] extends > CloseableIterator[E] { > def innerIterator: Iterator[E] > override def next(): E = innerIterator.next() > override def hasNext: Boolean = innerIterator.hasNext > override def close(): Unit = innerIterator match { > case it: CloseableIterator[E] => it.close() > case _ => // nothing > } > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46334) Update pandas to 2.1.4
[ https://issues.apache.org/jira/browse/SPARK-46334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46334: --- Labels: pull-request-available (was: ) > Update pandas to 2.1.4 > -- > > Key: SPARK-46334 > URL: https://issues.apache.org/jira/browse/SPARK-46334 > Project: Spark > Issue Type: Dependency upgrade > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Bjørn Jørgensen >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46334) Update pandas to 2.1.4
Bjørn Jørgensen created SPARK-46334: --- Summary: Update pandas to 2.1.4 Key: SPARK-46334 URL: https://issues.apache.org/jira/browse/SPARK-46334 Project: Spark Issue Type: Dependency upgrade Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Bjørn Jørgensen -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46332) Migrate CatalogNotFoundException to an error class
[ https://issues.apache.org/jira/browse/SPARK-46332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46332. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44259 [https://github.com/apache/spark/pull/44259] > Migrate CatalogNotFoundException to an error class > -- > > Key: SPARK-46332 > URL: https://issues.apache.org/jira/browse/SPARK-46332 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Migrate the exception CatalogNotFoundException to an error class and > - prohibit creation of CatalogNotFoundException w/o a error class > - introduce new error class > - create CatalogNotFoundException using new error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46324) Fix the output name of pyspark.sql.functions.user and session_user
[ https://issues.apache.org/jira/browse/SPARK-46324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46324. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44253 [https://github.com/apache/spark/pull/44253] > Fix the output name of pyspark.sql.functions.user and session_user > -- > > Key: SPARK-46324 > URL: https://issues.apache.org/jira/browse/SPARK-46324 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > {code} > scala> spark.range(1).select(user()).show() > +--+ > |current_user()| > +--+ > | hyukjin.kwon| > +--+ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46324) Fix the output name of pyspark.sql.functions.user and session_user
[ https://issues.apache.org/jira/browse/SPARK-46324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46324: - Assignee: Hyukjin Kwon > Fix the output name of pyspark.sql.functions.user and session_user > -- > > Key: SPARK-46324 > URL: https://issues.apache.org/jira/browse/SPARK-46324 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > {code} > scala> spark.range(1).select(user()).show() > +--+ > |current_user()| > +--+ > | hyukjin.kwon| > +--+ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46328) Allocate capacity of array list of TColumns by columns size in TRowSet generation
[ https://issues.apache.org/jira/browse/SPARK-46328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46328. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44258 [https://github.com/apache/spark/pull/44258] > Allocate capacity of array list of TColumns by columns size in TRowSet > generation > - > > Key: SPARK-46328 > URL: https://issues.apache.org/jira/browse/SPARK-46328 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Bowen Liang >Assignee: Bowen Liang >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > Background: > ArrayList is created for TColumn value collections in RowSetUtils for TRowSet > generation. Currently, they are created with Java's default capacity of 16, > rather than by the number of columns, which could cause array copying in > assembling each TColumn collections when the column number exceeds the > default capacity. > > Suggested solution: > Allocate enough capacity by columns size for assembling array list of > TColumns in TRowSet generation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46328) Allocate capacity of array list of TColumns by columns size in TRowSet generation
[ https://issues.apache.org/jira/browse/SPARK-46328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46328: - Assignee: Bowen Liang > Allocate capacity of array list of TColumns by columns size in TRowSet > generation > - > > Key: SPARK-46328 > URL: https://issues.apache.org/jira/browse/SPARK-46328 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Bowen Liang >Assignee: Bowen Liang >Priority: Minor > Labels: pull-request-available > > Background: > ArrayList is created for TColumn value collections in RowSetUtils for TRowSet > generation. Currently, they are created with Java's default capacity of 16, > rather than by the number of columns, which could cause array copying in > assembling each TColumn collections when the column number exceeds the > default capacity. > > Suggested solution: > Allocate enough capacity by columns size for assembling array list of > TColumns in TRowSet generation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46328) Allocate capacity of array list of TColumns by columns size in TRowSet generation
[ https://issues.apache.org/jira/browse/SPARK-46328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46328: -- Affects Version/s: 4.0.0 (was: 3.5.0) > Allocate capacity of array list of TColumns by columns size in TRowSet > generation > - > > Key: SPARK-46328 > URL: https://issues.apache.org/jira/browse/SPARK-46328 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Bowen Liang >Priority: Minor > Labels: pull-request-available > > Background: > ArrayList is created for TColumn value collections in RowSetUtils for TRowSet > generation. Currently, they are created with Java's default capacity of 16, > rather than by the number of columns, which could cause array copying in > assembling each TColumn collections when the column number exceeds the > default capacity. > > Suggested solution: > Allocate enough capacity by columns size for assembling array list of > TColumns in TRowSet generation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46332) Migrate CatalogNotFoundException to an error class
[ https://issues.apache.org/jira/browse/SPARK-46332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46332: --- Labels: pull-request-available (was: ) > Migrate CatalogNotFoundException to an error class > -- > > Key: SPARK-46332 > URL: https://issues.apache.org/jira/browse/SPARK-46332 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > > Migrate the exception CatalogNotFoundException to an error class and > - prohibit creation of CatalogNotFoundException w/o a error class > - introduce new error class > - create CatalogNotFoundException using new error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46132) [CORE] Support key password for JKS keys for RPC SSL
[ https://issues.apache.org/jira/browse/SPARK-46132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46132: --- Labels: pull-request-available (was: ) > [CORE] Support key password for JKS keys for RPC SSL > > > Key: SPARK-46132 > URL: https://issues.apache.org/jira/browse/SPARK-46132 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Hasnain Lakhani >Priority: Major > Labels: pull-request-available > > See thread at > https://github.com/apache/spark/pull/43998#discussion_r1406993411 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46289) Exception when ordering by UDT in interpreted mode
[ https://issues.apache.org/jira/browse/SPARK-46289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-46289: -- Priority: Minor (was: Major) > Exception when ordering by UDT in interpreted mode > -- > > Key: SPARK-46289 > URL: https://issues.apache.org/jira/browse/SPARK-46289 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3, 3.4.2, 3.5.0 >Reporter: Bruce Robbins >Priority: Minor > > In interpreted mode, ordering by a UDT will result in an exception. For > example: > {noformat} > import org.apache.spark.ml.linalg.{DenseVector, Vector} > val df = Seq.tabulate(30) { x => > (x, x + 1, x + 2, new DenseVector(Array((x/100.0).toDouble, ((x + > 1)/100.0).toDouble, ((x + 3)/100.0).toDouble))) > }.toDF("id", "c1", "c2", "c3") > df.createOrReplaceTempView("df") > // this works > sql("select * from df order by c3").collect > sql("set spark.sql.codegen.wholeStage=false") > sql("set spark.sql.codegen.factoryMode=NO_CODEGEN") > // this gets an error > sql("select * from df order by c3").collect > {noformat} > The second {{collect}} action results in the following exception: > {noformat} > org.apache.spark.SparkIllegalArgumentException: Type > UninitializedPhysicalType does not support ordered operations. > at > org.apache.spark.sql.errors.QueryExecutionErrors$.orderedOperationUnsupportedByDataTypeError(QueryExecutionErrors.scala:348) > at > org.apache.spark.sql.catalyst.types.UninitializedPhysicalType$.ordering(PhysicalDataType.scala:332) > at > org.apache.spark.sql.catalyst.types.UninitializedPhysicalType$.ordering(PhysicalDataType.scala:329) > at > org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.compare(ordering.scala:60) > at > org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.compare(ordering.scala:39) > at > org.apache.spark.sql.execution.UnsafeExternalRowSorter$RowComparator.compare(UnsafeExternalRowSorter.java:254) > {noformat} > Note: You don't get an error if you use {{show}} rather than {{collect}}. > This is because {{show}} will implicitly add a {{limit}}, in which case the > ordering is performed by {{TakeOrderedAndProject}} rather than > {{UnsafeExternalRowSorter}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45599) Percentile can produce a wrong answer if -0.0 and 0.0 are mixed in the dataset
[ https://issues.apache.org/jira/browse/SPARK-45599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45599: -- Priority: Critical (was: Blocker) > Percentile can produce a wrong answer if -0.0 and 0.0 are mixed in the dataset > -- > > Key: SPARK-45599 > URL: https://issues.apache.org/jira/browse/SPARK-45599 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.2.3, 3.5.0 >Reporter: Robert Joseph Evans >Priority: Critical > Labels: data-corruption > > I think this actually impacts all versions that have ever supported > percentile and it may impact other things because the bug is in OpenHashMap. > > I am really surprised that we caught this bug because everything has to hit > just wrong to make it happen. in python/pyspark if you run > > {code:python} > from math import * > from pyspark.sql.types import * > data = [(1.779652973678931e+173,), (9.247723870123388e-295,), > (5.891823952773268e+98,), (inf,), (1.9042708096454302e+195,), > (-3.085825028509117e+74,), (-1.9569489404314425e+128,), > (2.0738138203216883e+201,), (inf,), (2.5212410617263588e-282,), > (-2.646144697462316e-35,), (-3.468683249247593e-196,), (nan,), (None,), > (nan,), (1.822129180806602e-245,), (5.211702553315461e-259,), (-1.0,), > (-5.682293414619055e+46,), (-4.585039307326895e+166,), > (-5.936844510098297e-82,), (-5234708055733.116,), (4920675036.053339,), > (None,), (4.4501477170144023e-308,), (2.176024662699802e-210,), > (-5.046677974902737e+132,), (-5.490780063080251e-09,), > (1.703824427218836e-55,), (-1.1961155424160076e+102,), > (1.4403274475565667e+41,), (None,), (5.4470705929955455e-86,), > (5.120795466142678e-215,), (-9.01991342808203e+282,), > (4.051866849943636e-254,), (-3588518231990.927,), (-1.8891559842111865e+63,), > (3.4543959813437507e-304,), (-7.590734560275502e-63,), > (9.376528689861087e+117,), (-2.1696969883753554e-292,), > (7.227411393136537e+206,), (-2.428999624265911e-293,), > (5.741383583382542e-14,), (-1.4882040107841963e+286,), > (2.1973064836362255e-159,), (0.028096279323357867,), > (8.475809563703283e-64,), (3.002803065141241e-139,), > (-1.1041009815645263e+203,), (1.8461539468514548e-225,), > (-5.620339412794757e-251,), (3.5103766991437114e-60,), > (2.4925669515657655e+165,), (3.217759099462207e+108,), > (-8.796717685143486e+203,), (2.037360925124577e+292,), > (-6.542279108216022e+206,), (-7.951172614280046e-74,), > (6.226527569272003e+152,), (-5.673977270111637e-84,), > (-1.0186016078084965e-281,), (1.7976931348623157e+308,), > (4.205809391029644e+137,), (-9.871721037428167e+119,), (None,), > (-1.6663254121185628e-256,), (1.0075153091760986e-236,), (-0.0,), (0.0,), > (1.7976931348623157e+308,), (4.3214483342777574e-117,), > (-7.973642629411105e-89,), (-1.1028137694801181e-297,), > (2.9000325280299273e-39,), (-1.077534929323113e-264,), > (-1.1847952892216515e+137,), (nan,), (7.849390806334983e+226,), > (-1.831402251805194e+65,), (-2.664533698035492e+203,), > (-2.2385155698231885e+285,), (-2.3016388448634844e-155,), > (-9.607772864590422e+217,), (3.437191836077251e+209,), > (1.9846569552093057e-137,), (-3.010452936419635e-233,), > (1.4309793775440402e-87,), (-2.9383643865423363e-103,), > (-4.696878567317712e-162,), (8.391630779050713e-135,), (nan,), > (-3.3885098786542755e-128,), (-4.5154178008513483e-122,), (nan,), (nan,), > (2.187766760184779e+306,), (7.679268835670585e+223,), > (6.3131466321042515e+153,), (1.779652973678931e+173,), > (9.247723870123388e-295,), (5.891823952773268e+98,), (inf,), > (1.9042708096454302e+195,), (-3.085825028509117e+74,), > (-1.9569489404314425e+128,), (2.0738138203216883e+201,), (inf,), > (2.5212410617263588e-282,), (-2.646144697462316e-35,), > (-3.468683249247593e-196,), (nan,), (None,), (nan,), > (1.822129180806602e-245,), (5.211702553315461e-259,), (-1.0,), > (-5.682293414619055e+46,), (-4.585039307326895e+166,), > (-5.936844510098297e-82,), (-5234708055733.116,), (4920675036.053339,), > (None,), (4.4501477170144023e-308,), (2.176024662699802e-210,), > (-5.046677974902737e+132,), (-5.490780063080251e-09,), > (1.703824427218836e-55,), (-1.1961155424160076e+102,), > (1.4403274475565667e+41,), (None,), (5.4470705929955455e-86,), > (5.120795466142678e-215,), (-9.01991342808203e+282,), > (4.051866849943636e-254,), (-3588518231990.927,), (-1.8891559842111865e+63,), > (3.4543959813437507e-304,), (-7.590734560275502e-63,), > (9.376528689861087e+117,), (-2.1696969883753554e-292,), > (7.227411393136537e+206,), (-2.428999624265911e-293,), > (5.741383583382542e-14,), (-1.4882040107841963e+286,), > (2.1973064836362255e-159,), (0.028096279323357867,), > (8.475809563703283e-64,), (3.002803065141241e-139,), >
[jira] [Updated] (SPARK-45599) Percentile can produce a wrong answer if -0.0 and 0.0 are mixed in the dataset
[ https://issues.apache.org/jira/browse/SPARK-45599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45599: -- Affects Version/s: 1.6.3 1.4.1 > Percentile can produce a wrong answer if -0.0 and 0.0 are mixed in the dataset > -- > > Key: SPARK-45599 > URL: https://issues.apache.org/jira/browse/SPARK-45599 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1, 1.6.3, 3.3.0, 3.2.3, 3.5.0 >Reporter: Robert Joseph Evans >Priority: Critical > Labels: data-corruption > > I think this actually impacts all versions that have ever supported > percentile and it may impact other things because the bug is in OpenHashMap. > > I am really surprised that we caught this bug because everything has to hit > just wrong to make it happen. in python/pyspark if you run > > {code:python} > from math import * > from pyspark.sql.types import * > data = [(1.779652973678931e+173,), (9.247723870123388e-295,), > (5.891823952773268e+98,), (inf,), (1.9042708096454302e+195,), > (-3.085825028509117e+74,), (-1.9569489404314425e+128,), > (2.0738138203216883e+201,), (inf,), (2.5212410617263588e-282,), > (-2.646144697462316e-35,), (-3.468683249247593e-196,), (nan,), (None,), > (nan,), (1.822129180806602e-245,), (5.211702553315461e-259,), (-1.0,), > (-5.682293414619055e+46,), (-4.585039307326895e+166,), > (-5.936844510098297e-82,), (-5234708055733.116,), (4920675036.053339,), > (None,), (4.4501477170144023e-308,), (2.176024662699802e-210,), > (-5.046677974902737e+132,), (-5.490780063080251e-09,), > (1.703824427218836e-55,), (-1.1961155424160076e+102,), > (1.4403274475565667e+41,), (None,), (5.4470705929955455e-86,), > (5.120795466142678e-215,), (-9.01991342808203e+282,), > (4.051866849943636e-254,), (-3588518231990.927,), (-1.8891559842111865e+63,), > (3.4543959813437507e-304,), (-7.590734560275502e-63,), > (9.376528689861087e+117,), (-2.1696969883753554e-292,), > (7.227411393136537e+206,), (-2.428999624265911e-293,), > (5.741383583382542e-14,), (-1.4882040107841963e+286,), > (2.1973064836362255e-159,), (0.028096279323357867,), > (8.475809563703283e-64,), (3.002803065141241e-139,), > (-1.1041009815645263e+203,), (1.8461539468514548e-225,), > (-5.620339412794757e-251,), (3.5103766991437114e-60,), > (2.4925669515657655e+165,), (3.217759099462207e+108,), > (-8.796717685143486e+203,), (2.037360925124577e+292,), > (-6.542279108216022e+206,), (-7.951172614280046e-74,), > (6.226527569272003e+152,), (-5.673977270111637e-84,), > (-1.0186016078084965e-281,), (1.7976931348623157e+308,), > (4.205809391029644e+137,), (-9.871721037428167e+119,), (None,), > (-1.6663254121185628e-256,), (1.0075153091760986e-236,), (-0.0,), (0.0,), > (1.7976931348623157e+308,), (4.3214483342777574e-117,), > (-7.973642629411105e-89,), (-1.1028137694801181e-297,), > (2.9000325280299273e-39,), (-1.077534929323113e-264,), > (-1.1847952892216515e+137,), (nan,), (7.849390806334983e+226,), > (-1.831402251805194e+65,), (-2.664533698035492e+203,), > (-2.2385155698231885e+285,), (-2.3016388448634844e-155,), > (-9.607772864590422e+217,), (3.437191836077251e+209,), > (1.9846569552093057e-137,), (-3.010452936419635e-233,), > (1.4309793775440402e-87,), (-2.9383643865423363e-103,), > (-4.696878567317712e-162,), (8.391630779050713e-135,), (nan,), > (-3.3885098786542755e-128,), (-4.5154178008513483e-122,), (nan,), (nan,), > (2.187766760184779e+306,), (7.679268835670585e+223,), > (6.3131466321042515e+153,), (1.779652973678931e+173,), > (9.247723870123388e-295,), (5.891823952773268e+98,), (inf,), > (1.9042708096454302e+195,), (-3.085825028509117e+74,), > (-1.9569489404314425e+128,), (2.0738138203216883e+201,), (inf,), > (2.5212410617263588e-282,), (-2.646144697462316e-35,), > (-3.468683249247593e-196,), (nan,), (None,), (nan,), > (1.822129180806602e-245,), (5.211702553315461e-259,), (-1.0,), > (-5.682293414619055e+46,), (-4.585039307326895e+166,), > (-5.936844510098297e-82,), (-5234708055733.116,), (4920675036.053339,), > (None,), (4.4501477170144023e-308,), (2.176024662699802e-210,), > (-5.046677974902737e+132,), (-5.490780063080251e-09,), > (1.703824427218836e-55,), (-1.1961155424160076e+102,), > (1.4403274475565667e+41,), (None,), (5.4470705929955455e-86,), > (5.120795466142678e-215,), (-9.01991342808203e+282,), > (4.051866849943636e-254,), (-3588518231990.927,), (-1.8891559842111865e+63,), > (3.4543959813437507e-304,), (-7.590734560275502e-63,), > (9.376528689861087e+117,), (-2.1696969883753554e-292,), > (7.227411393136537e+206,), (-2.428999624265911e-293,), > (5.741383583382542e-14,), (-1.4882040107841963e+286,), > (2.1973064836362255e-159,), (0.028096279323357867,), > (8.475809563703283e-64,),
[jira] [Commented] (SPARK-44900) Cached DataFrame keeps growing
[ https://issues.apache.org/jira/browse/SPARK-44900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794817#comment-17794817 ] Dongjoon Hyun commented on SPARK-44900: --- Could you try this with Apache Spark 3.5.0, please? > Cached DataFrame keeps growing > -- > > Key: SPARK-44900 > URL: https://issues.apache.org/jira/browse/SPARK-44900 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Varun Nalla >Priority: Major > > Scenario : > We have a kafka streaming application where the data lookups are happening by > joining another DF which is cached, and the caching strategy is > MEMORY_AND_DISK. > However the size of the cached DataFrame keeps on growing for every micro > batch the streaming application process and that's being visible under > storage tab. > A similar stack overflow thread was already raised. > https://stackoverflow.com/questions/55601779/spark-dataframe-cache-keeps-growing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44772) Reading blocks from remote executors causes timeout issue
[ https://issues.apache.org/jira/browse/SPARK-44772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44772. --- Resolution: Cannot Reproduce > Reading blocks from remote executors causes timeout issue > -- > > Key: SPARK-44772 > URL: https://issues.apache.org/jira/browse/SPARK-44772 > Project: Spark > Issue Type: Bug > Components: EC2, PySpark, Shuffle, Spark Core >Affects Versions: 3.1.2 >Reporter: nebi mert aydin >Priority: Major > > I'm using EMR 6.5 with Spark 3.1.2 > I'm shuffling 1.5 TiB of data with 3000 executors with 4 cores 23 gig memory > for executors > Also speculative mode is on. > {code:java} > // df.repartition(6000) {code} > I see lots of failures with > {code:java} > 2023-08-11 01:01:09,846 ERROR > org.apache.spark.network.server.ChunkFetchRequestHandler > (shuffle-server-4-95): Error sending result > ChunkFetchSuccess[streamChunkId=StreamChunkId[streamId=779084003612,chunkIndex=323],buffer=FileSegmentManagedBuffer[file=/mnt3/yarn/usercache/zeppelin/appcache/application_1691438567823_0012/blockmgr-0d82ca05-9429-4ff2-9f61-e779e8e60648/07/shuffle_5_114492_0.data,offset=1836997,length=618]] > to /172.31.20.110:36654; closing connection > java.nio.channels.ClosedChannelException > at > org.sparkproject.io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957) > at > org.sparkproject.io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865) > at > org.sparkproject.io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702) > at > org.sparkproject.io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:110) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702) > at > org.sparkproject.io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:302) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:808) > at > org.sparkproject.io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025) > at > org.sparkproject.io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:294) > at > org.apache.spark.network.server.ChunkFetchRequestHandler.respond(ChunkFetchRequestHandler.java:142) > at > org.apache.spark.network.server.ChunkFetchRequestHandler.processFetchRequest(ChunkFetchRequestHandler.java:116) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:107) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) > at > org.sparkproject.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at >
[jira] [Updated] (SPARK-44900) Cached DataFrame keeps growing
[ https://issues.apache.org/jira/browse/SPARK-44900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44900: -- Priority: Major (was: Blocker) > Cached DataFrame keeps growing > -- > > Key: SPARK-44900 > URL: https://issues.apache.org/jira/browse/SPARK-44900 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Varun Nalla >Priority: Major > > Scenario : > We have a kafka streaming application where the data lookups are happening by > joining another DF which is cached, and the caching strategy is > MEMORY_AND_DISK. > However the size of the cached DataFrame keeps on growing for every micro > batch the streaming application process and that's being visible under > storage tab. > A similar stack overflow thread was already raised. > https://stackoverflow.com/questions/55601779/spark-dataframe-cache-keeps-growing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46333) Replace IllegalStateException by SparkException.internalError in catalyst
Max Gekk created SPARK-46333: Summary: Replace IllegalStateException by SparkException.internalError in catalyst Key: SPARK-46333 URL: https://issues.apache.org/jira/browse/SPARK-46333 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Max Gekk Assignee: Max Gekk Replace IllegalStateException by SparkException.internalError in catalyst as a part of migration onto new error framework and error classes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44772) Reading blocks from remote executors causes timeout issue
[ https://issues.apache.org/jira/browse/SPARK-44772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794816#comment-17794816 ] Dongjoon Hyun commented on SPARK-44772: --- Could you try this with Apache Spark 3.5.0? > Reading blocks from remote executors causes timeout issue > -- > > Key: SPARK-44772 > URL: https://issues.apache.org/jira/browse/SPARK-44772 > Project: Spark > Issue Type: Bug > Components: EC2, PySpark, Shuffle, Spark Core >Affects Versions: 3.1.2 >Reporter: nebi mert aydin >Priority: Blocker > > I'm using EMR 6.5 with Spark 3.1.2 > I'm shuffling 1.5 TiB of data with 3000 executors with 4 cores 23 gig memory > for executors > Also speculative mode is on. > {code:java} > // df.repartition(6000) {code} > I see lots of failures with > {code:java} > 2023-08-11 01:01:09,846 ERROR > org.apache.spark.network.server.ChunkFetchRequestHandler > (shuffle-server-4-95): Error sending result > ChunkFetchSuccess[streamChunkId=StreamChunkId[streamId=779084003612,chunkIndex=323],buffer=FileSegmentManagedBuffer[file=/mnt3/yarn/usercache/zeppelin/appcache/application_1691438567823_0012/blockmgr-0d82ca05-9429-4ff2-9f61-e779e8e60648/07/shuffle_5_114492_0.data,offset=1836997,length=618]] > to /172.31.20.110:36654; closing connection > java.nio.channels.ClosedChannelException > at > org.sparkproject.io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957) > at > org.sparkproject.io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865) > at > org.sparkproject.io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702) > at > org.sparkproject.io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:110) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702) > at > org.sparkproject.io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:302) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:808) > at > org.sparkproject.io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025) > at > org.sparkproject.io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:294) > at > org.apache.spark.network.server.ChunkFetchRequestHandler.respond(ChunkFetchRequestHandler.java:142) > at > org.apache.spark.network.server.ChunkFetchRequestHandler.processFetchRequest(ChunkFetchRequestHandler.java:116) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:107) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) > at > org.sparkproject.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at >
[jira] [Updated] (SPARK-44772) Reading blocks from remote executors causes timeout issue
[ https://issues.apache.org/jira/browse/SPARK-44772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44772: -- Priority: Major (was: Blocker) > Reading blocks from remote executors causes timeout issue > -- > > Key: SPARK-44772 > URL: https://issues.apache.org/jira/browse/SPARK-44772 > Project: Spark > Issue Type: Bug > Components: EC2, PySpark, Shuffle, Spark Core >Affects Versions: 3.1.2 >Reporter: nebi mert aydin >Priority: Major > > I'm using EMR 6.5 with Spark 3.1.2 > I'm shuffling 1.5 TiB of data with 3000 executors with 4 cores 23 gig memory > for executors > Also speculative mode is on. > {code:java} > // df.repartition(6000) {code} > I see lots of failures with > {code:java} > 2023-08-11 01:01:09,846 ERROR > org.apache.spark.network.server.ChunkFetchRequestHandler > (shuffle-server-4-95): Error sending result > ChunkFetchSuccess[streamChunkId=StreamChunkId[streamId=779084003612,chunkIndex=323],buffer=FileSegmentManagedBuffer[file=/mnt3/yarn/usercache/zeppelin/appcache/application_1691438567823_0012/blockmgr-0d82ca05-9429-4ff2-9f61-e779e8e60648/07/shuffle_5_114492_0.data,offset=1836997,length=618]] > to /172.31.20.110:36654; closing connection > java.nio.channels.ClosedChannelException > at > org.sparkproject.io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957) > at > org.sparkproject.io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865) > at > org.sparkproject.io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702) > at > org.sparkproject.io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:110) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702) > at > org.sparkproject.io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:302) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:808) > at > org.sparkproject.io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025) > at > org.sparkproject.io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:294) > at > org.apache.spark.network.server.ChunkFetchRequestHandler.respond(ChunkFetchRequestHandler.java:142) > at > org.apache.spark.network.server.ChunkFetchRequestHandler.processFetchRequest(ChunkFetchRequestHandler.java:116) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:107) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) > at > org.sparkproject.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at >
[jira] [Created] (SPARK-46332) Migrate CatalogNotFoundException to an error class
Max Gekk created SPARK-46332: Summary: Migrate CatalogNotFoundException to an error class Key: SPARK-46332 URL: https://issues.apache.org/jira/browse/SPARK-46332 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Max Gekk Assignee: Max Gekk Migrate the exception CatalogNotFoundException to an error class and - prohibit creation of CatalogNotFoundException w/o a error class - introduce new error class - create CatalogNotFoundException using new error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46331) Removing CodeGenFallback trait from subset of datetime and spark version functions
[ https://issues.apache.org/jira/browse/SPARK-46331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46331: --- Labels: pull-request-available (was: ) > Removing CodeGenFallback trait from subset of datetime and spark version > functions > -- > > Key: SPARK-46331 > URL: https://issues.apache.org/jira/browse/SPARK-46331 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Priority: Major > Labels: pull-request-available > > This change moves us further into direction of removing CodegenFallback and > instead using RuntimeReplacable with StaticInvoke which will directly insert > provided code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46327) Reorganize `SeriesStringTests`
[ https://issues.apache.org/jira/browse/SPARK-46327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46327. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44257 [https://github.com/apache/spark/pull/44257] > Reorganize `SeriesStringTests` > -- > > Key: SPARK-46327 > URL: https://issues.apache.org/jira/browse/SPARK-46327 > Project: Spark > Issue Type: Test > Components: Connect, PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46327) Reorganize `SeriesStringTests`
[ https://issues.apache.org/jira/browse/SPARK-46327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46327: Assignee: Ruifeng Zheng > Reorganize `SeriesStringTests` > -- > > Key: SPARK-46327 > URL: https://issues.apache.org/jira/browse/SPARK-46327 > Project: Spark > Issue Type: Test > Components: Connect, PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46331) Removing CodeGenFallback trait from subset of datetime and spark version functions
Aleksandar Tomic created SPARK-46331: Summary: Removing CodeGenFallback trait from subset of datetime and spark version functions Key: SPARK-46331 URL: https://issues.apache.org/jira/browse/SPARK-46331 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Aleksandar Tomic This change moves us further into direction of removing CodegenFallback and instead using RuntimeReplacable with StaticInvoke which will directly insert provided code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46326) Test missing cases for functions (pyspark.sql.functions)
[ https://issues.apache.org/jira/browse/SPARK-46326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46326: Assignee: Hyukjin Kwon > Test missing cases for functions (pyspark.sql.functions) > > > Key: SPARK-46326 > URL: https://issues.apache.org/jira/browse/SPARK-46326 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > https://app.codecov.io/gh/apache/spark/blob/master/python%2Fpyspark%2Fsql%2Fsession.py -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46326) Test missing cases for functions (pyspark.sql.functions)
[ https://issues.apache.org/jira/browse/SPARK-46326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46326. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44256 [https://github.com/apache/spark/pull/44256] > Test missing cases for functions (pyspark.sql.functions) > > > Key: SPARK-46326 > URL: https://issues.apache.org/jira/browse/SPARK-46326 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > https://app.codecov.io/gh/apache/spark/blob/master/python%2Fpyspark%2Fsql%2Fsession.py -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46330) Loading of Spark UI blocks for a long time when HybridStore enabled
[ https://issues.apache.org/jira/browse/SPARK-46330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46330: --- Labels: pull-request-available (was: ) > Loading of Spark UI blocks for a long time when HybridStore enabled > --- > > Key: SPARK-46330 > URL: https://issues.apache.org/jira/browse/SPARK-46330 > Project: Spark > Issue Type: Bug > Components: UI >Affects Versions: 3.1.2 >Reporter: Zhou Yifan >Priority: Major > Labels: pull-request-available > > In our SparkHistoryServer, we used these two property to speed up Spark UI's > loading: > {code:java} > spark.history.store.hybridStore.enabled true > spark.history.store.hybridStore.maxMemoryUsage 16g {code} > Occasionally, we found it took minutes to load a small eventlog which usually > took seconds. > In the jstack output of SparkHistoryServer, we found that 4 threads were > blocked and waiting to lock > *org.apache.spark.deploy.history.FsHistoryProvider* object monitor, which was > locked by thread "spark-history-task-0" closing a HybridStore. > {code:java} > "qtp791499503-2688947" #2688947 daemon prio=5 os_prio=0 > tid=0x7f4044042800 nid=0x8d98 waiting for monitor entry > [0x7f3f6476] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:386) > - waiting to lock <0x0004c64433f0> (a > org.apache.spark.deploy.history.FsHistoryProvider) > at > org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:194) > at > org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:182) > at > org.apache.spark.deploy.history.ApplicationCache$$Lambda$805/90086258.apply(Unknown > Source) > at > org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:154) > at > org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:180) > at > org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:71) > at > org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:58) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > - locked <0x00066effc3e8> (a > org.sparkproject.guava.cache.LocalCache$StrongAccessEntry) > at > org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) > at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:108) > at > org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:120) > at > org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:251) > at > org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:99) > "spark-history-task-0" #49 daemon prio=5 os_prio=0 tid=0x7f431e55b800 > nid=0x1ac6 in Object.wait() [0x7f41b2cc9000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1252) > - locked <0x00063ccbc9f0> (a java.lang.Thread) > at java.lang.Thread.join(Thread.java:1326) > at > org.apache.spark.deploy.history.HybridStore.close(HybridStore.scala:106) > at org.apache.spark.status.AppStatusStore.close(AppStatusStore.scala:553) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1(FsHistoryProvider.scala:913) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1$adapted(FsHistoryProvider.scala:911) > at > org.apache.spark.deploy.history.FsHistoryProvider$$Lambda$416/229723341.apply(Unknown > Source) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.deploy.history.FsHistoryProvider.invalidateUI(FsHistoryProvider.scala:911) > - locked <0x0004c64433f0> (a > org.apache.spark.deploy.history.FsHistoryProvider) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$7(FsHistoryProvider.scala:541) > at >
[jira] (SPARK-46330) Loading of Spark UI blocks for a long time when HybridStore enabled
[ https://issues.apache.org/jira/browse/SPARK-46330 ] Zhou Yifan deleted comment on SPARK-46330: was (Author: zhouyifan279): *HybridStore#close,* may took long if there was still a lot of data waiting to be written to disk when closing. I tried a 1.64 GB eventlog. It took 93944 ms to write all data to disk. > Loading of Spark UI blocks for a long time when HybridStore enabled > --- > > Key: SPARK-46330 > URL: https://issues.apache.org/jira/browse/SPARK-46330 > Project: Spark > Issue Type: Bug > Components: UI >Affects Versions: 3.1.2 >Reporter: Zhou Yifan >Priority: Major > > In our SparkHistoryServer, we used these two property to speed up Spark UI's > loading: > {code:java} > spark.history.store.hybridStore.enabled true > spark.history.store.hybridStore.maxMemoryUsage 16g {code} > Occasionally, we found it took minutes to load a small eventlog which usually > took seconds. > In the jstack output of SparkHistoryServer, we found that 4 threads were > blocked and waiting to lock > *org.apache.spark.deploy.history.FsHistoryProvider* object monitor, which was > locked by thread "spark-history-task-0" closing a HybridStore. > {code:java} > "qtp791499503-2688947" #2688947 daemon prio=5 os_prio=0 > tid=0x7f4044042800 nid=0x8d98 waiting for monitor entry > [0x7f3f6476] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:386) > - waiting to lock <0x0004c64433f0> (a > org.apache.spark.deploy.history.FsHistoryProvider) > at > org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:194) > at > org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:182) > at > org.apache.spark.deploy.history.ApplicationCache$$Lambda$805/90086258.apply(Unknown > Source) > at > org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:154) > at > org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:180) > at > org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:71) > at > org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:58) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > - locked <0x00066effc3e8> (a > org.sparkproject.guava.cache.LocalCache$StrongAccessEntry) > at > org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) > at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:108) > at > org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:120) > at > org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:251) > at > org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:99) > "spark-history-task-0" #49 daemon prio=5 os_prio=0 tid=0x7f431e55b800 > nid=0x1ac6 in Object.wait() [0x7f41b2cc9000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1252) > - locked <0x00063ccbc9f0> (a java.lang.Thread) > at java.lang.Thread.join(Thread.java:1326) > at > org.apache.spark.deploy.history.HybridStore.close(HybridStore.scala:106) > at org.apache.spark.status.AppStatusStore.close(AppStatusStore.scala:553) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1(FsHistoryProvider.scala:913) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1$adapted(FsHistoryProvider.scala:911) > at > org.apache.spark.deploy.history.FsHistoryProvider$$Lambda$416/229723341.apply(Unknown > Source) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.deploy.history.FsHistoryProvider.invalidateUI(FsHistoryProvider.scala:911) > - locked <0x0004c64433f0> (a > org.apache.spark.deploy.history.FsHistoryProvider) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$7(FsHistoryProvider.scala:541) > at >
[jira] [Updated] (SPARK-46330) Loading of Spark UI blocks for a long time when HybridStore enabled
[ https://issues.apache.org/jira/browse/SPARK-46330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhou Yifan updated SPARK-46330: --- Description: In our SparkHistoryServer, we used these two property to speed up Spark UI's loading: {code:java} spark.history.store.hybridStore.enabled true spark.history.store.hybridStore.maxMemoryUsage 16g {code} Occasionally, we found it took minutes to load a small eventlog which usually took seconds. In the jstack output of SparkHistoryServer, we found that 4 threads were blocked and waiting to lock *org.apache.spark.deploy.history.FsHistoryProvider* object monitor, which was locked by thread "spark-history-task-0" closing a HybridStore. {code:java} "qtp791499503-2688947" #2688947 daemon prio=5 os_prio=0 tid=0x7f4044042800 nid=0x8d98 waiting for monitor entry [0x7f3f6476] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:386) - waiting to lock <0x0004c64433f0> (a org.apache.spark.deploy.history.FsHistoryProvider) at org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:194) at org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:182) at org.apache.spark.deploy.history.ApplicationCache$$Lambda$805/90086258.apply(Unknown Source) at org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:154) at org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:180) at org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:71) at org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:58) at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) at org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) - locked <0x00066effc3e8> (a org.sparkproject.guava.cache.LocalCache$StrongAccessEntry) at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:108) at org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:120) at org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:251) at org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:99) "spark-history-task-0" #49 daemon prio=5 os_prio=0 tid=0x7f431e55b800 nid=0x1ac6 in Object.wait() [0x7f41b2cc9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) - locked <0x00063ccbc9f0> (a java.lang.Thread) at java.lang.Thread.join(Thread.java:1326) at org.apache.spark.deploy.history.HybridStore.close(HybridStore.scala:106) at org.apache.spark.status.AppStatusStore.close(AppStatusStore.scala:553) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1(FsHistoryProvider.scala:913) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1$adapted(FsHistoryProvider.scala:911) at org.apache.spark.deploy.history.FsHistoryProvider$$Lambda$416/229723341.apply(Unknown Source) at scala.Option.foreach(Option.scala:407) at org.apache.spark.deploy.history.FsHistoryProvider.invalidateUI(FsHistoryProvider.scala:911) - locked <0x0004c64433f0> (a org.apache.spark.deploy.history.FsHistoryProvider) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$7(FsHistoryProvider.scala:541) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$7$adapted(FsHistoryProvider.scala:498){code} *HybridStore#close,* may took long if there was still a lot of data waiting to be written to disk when closing. I tried a 1.64 GB eventlog. It took 93944 ms to write all data to disk. was: In our SparkHistoryServer, we used these two property to speed up Spark UI's loading: {code:java} spark.history.store.hybridStore.enabled true spark.history.store.hybridStore.maxMemoryUsage 16g {code} Occasionally, we found it took minutes to load a small eventlog which usually took seconds. In the jstack output of SparkHistoryServer, we found that 4 threads were blocked and waiting to lock
[jira] [Updated] (SPARK-39176) Pyspark failed to serialize dates before 1970 in windows
[ https://issues.apache.org/jira/browse/SPARK-39176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-39176: --- Labels: 3.0.1 PySpark WIndows datetime pull-request-available (was: 3.0.1 PySpark WIndows datetime) > Pyspark failed to serialize dates before 1970 in windows > > > Key: SPARK-39176 > URL: https://issues.apache.org/jira/browse/SPARK-39176 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests, Windows >Affects Versions: 3.0.1 >Reporter: AnywalkerGISer >Priority: Major > Labels: 3.0.1, PySpark, WIndows, datetime, pull-request-available > Fix For: 3.0.1 > > > h3. Fix problems with pyspark in Windows > # Fixed datetime conversion to timestamp before 1970; > # Fixed datetime conversion when timestamp is negative; > # Adding a test script. > h3. Pyspark has problems serializing pre-1970 times in Windows > An exception occurs when executing the following code under Windows: > {code:java} > rdd = sc.parallelize([('a', datetime(1957, 1, 9, 0, 0)), > ('b', datetime(2014, 1, 27, 0, 0))]) > df = spark.createDataFrame(rdd, ["id", "date"]) > df.show() > df.printSchema() > print(df.collect()){code} > {code:java} > File "...\spark\python\lib\pyspark.zip\pyspark\sql\types.py", line 195, in > toInternal > else time.mktime(dt.timetuple())) > OverflowError: mktime argument out of range > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:503) >at > org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:638) >at > org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:621) >at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:456) >at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) >at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489) >at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) >at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) >at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) >at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) >at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729) >at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340) >at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872) >at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872) >at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) >at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) >at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) >at org.apache.spark.scheduler.Task.run(Task.scala:127) >at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) >at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) >at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) >at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) >at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) >... 1 more {code} > _*and*_ > {code:java} > File ...\spark\python\lib\pyspark.zip\pyspark\sql\types.py, in fromInternal: > Line 207: return datetime.datetime.fromtimestamp(ts // > 100).replace(microsecond=ts % 100) > OSError: [Errno 22] Invalid argument {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46328) Allocate capacity of array list of TColumns by columns size in TRowSet generation
[ https://issues.apache.org/jira/browse/SPARK-46328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46328: -- Assignee: Apache Spark > Allocate capacity of array list of TColumns by columns size in TRowSet > generation > - > > Key: SPARK-46328 > URL: https://issues.apache.org/jira/browse/SPARK-46328 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Bowen Liang >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > > Background: > ArrayList is created for TColumn value collections in RowSetUtils for TRowSet > generation. Currently, they are created with Java's default capacity of 16, > rather than by the number of columns, which could cause array copying in > assembling each TColumn collections when the column number exceeds the > default capacity. > > Suggested solution: > Allocate enough capacity by columns size for assembling array list of > TColumns in TRowSet generation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46328) Allocate capacity of array list of TColumns by columns size in TRowSet generation
[ https://issues.apache.org/jira/browse/SPARK-46328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46328: -- Assignee: (was: Apache Spark) > Allocate capacity of array list of TColumns by columns size in TRowSet > generation > - > > Key: SPARK-46328 > URL: https://issues.apache.org/jira/browse/SPARK-46328 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Bowen Liang >Priority: Minor > Labels: pull-request-available > > Background: > ArrayList is created for TColumn value collections in RowSetUtils for TRowSet > generation. Currently, they are created with Java's default capacity of 16, > rather than by the number of columns, which could cause array copying in > assembling each TColumn collections when the column number exceeds the > default capacity. > > Suggested solution: > Allocate enough capacity by columns size for assembling array list of > TColumns in TRowSet generation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46328) Allocate capacity of array list of TColumns by columns size in TRowSet generation
[ https://issues.apache.org/jira/browse/SPARK-46328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46328: -- Assignee: (was: Apache Spark) > Allocate capacity of array list of TColumns by columns size in TRowSet > generation > - > > Key: SPARK-46328 > URL: https://issues.apache.org/jira/browse/SPARK-46328 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Bowen Liang >Priority: Minor > Labels: pull-request-available > > Background: > ArrayList is created for TColumn value collections in RowSetUtils for TRowSet > generation. Currently, they are created with Java's default capacity of 16, > rather than by the number of columns, which could cause array copying in > assembling each TColumn collections when the column number exceeds the > default capacity. > > Suggested solution: > Allocate enough capacity by columns size for assembling array list of > TColumns in TRowSet generation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46328) Allocate capacity of array list of TColumns by columns size in TRowSet generation
[ https://issues.apache.org/jira/browse/SPARK-46328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46328: -- Assignee: Apache Spark > Allocate capacity of array list of TColumns by columns size in TRowSet > generation > - > > Key: SPARK-46328 > URL: https://issues.apache.org/jira/browse/SPARK-46328 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Bowen Liang >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > > Background: > ArrayList is created for TColumn value collections in RowSetUtils for TRowSet > generation. Currently, they are created with Java's default capacity of 16, > rather than by the number of columns, which could cause array copying in > assembling each TColumn collections when the column number exceeds the > default capacity. > > Suggested solution: > Allocate enough capacity by columns size for assembling array list of > TColumns in TRowSet generation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46330) Loading of Spark UI blocks for a long time when HybridStore enabled
[ https://issues.apache.org/jira/browse/SPARK-46330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794589#comment-17794589 ] Zhou Yifan commented on SPARK-46330: *HybridStore#close,* may took long if there was still a lot of data waiting to be written to disk when closing. I tried a 1.64 GB eventlog. It took 93944 ms to write all data to disk. > Loading of Spark UI blocks for a long time when HybridStore enabled > --- > > Key: SPARK-46330 > URL: https://issues.apache.org/jira/browse/SPARK-46330 > Project: Spark > Issue Type: Bug > Components: UI >Affects Versions: 3.1.2 >Reporter: Zhou Yifan >Priority: Major > > In our SparkHistoryServer, we used these two property to speed up Spark UI's > loading: > {code:java} > spark.history.store.hybridStore.enabled true > spark.history.store.hybridStore.maxMemoryUsage 16g {code} > Occasionally, we found it took minutes to load a small eventlog which usually > took seconds. > In the jstack output of SparkHistoryServer, we found that 4 threads were > blocked and waiting to lock > *org.apache.spark.deploy.history.FsHistoryProvider* object monitor, which was > locked by thread "spark-history-task-0" closing a HybridStore. > {code:java} > "qtp791499503-2688947" #2688947 daemon prio=5 os_prio=0 > tid=0x7f4044042800 nid=0x8d98 waiting for monitor entry > [0x7f3f6476] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:386) > - waiting to lock <0x0004c64433f0> (a > org.apache.spark.deploy.history.FsHistoryProvider) > at > org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:194) > at > org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:182) > at > org.apache.spark.deploy.history.ApplicationCache$$Lambda$805/90086258.apply(Unknown > Source) > at > org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:154) > at > org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:180) > at > org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:71) > at > org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:58) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > - locked <0x00066effc3e8> (a > org.sparkproject.guava.cache.LocalCache$StrongAccessEntry) > at > org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) > at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:108) > at > org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:120) > at > org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:251) > at > org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:99) > "spark-history-task-0" #49 daemon prio=5 os_prio=0 tid=0x7f431e55b800 > nid=0x1ac6 in Object.wait() [0x7f41b2cc9000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1252) > - locked <0x00063ccbc9f0> (a java.lang.Thread) > at java.lang.Thread.join(Thread.java:1326) > at > org.apache.spark.deploy.history.HybridStore.close(HybridStore.scala:106) > at org.apache.spark.status.AppStatusStore.close(AppStatusStore.scala:553) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1(FsHistoryProvider.scala:913) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1$adapted(FsHistoryProvider.scala:911) > at > org.apache.spark.deploy.history.FsHistoryProvider$$Lambda$416/229723341.apply(Unknown > Source) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.deploy.history.FsHistoryProvider.invalidateUI(FsHistoryProvider.scala:911) > - locked <0x0004c64433f0> (a > org.apache.spark.deploy.history.FsHistoryProvider) > at >
[jira] [Updated] (SPARK-46330) Loading of Spark UI blocks for a long time when HybridStore enabled
[ https://issues.apache.org/jira/browse/SPARK-46330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhou Yifan updated SPARK-46330: --- Description: In our SparkHistoryServer, we used these two property to speed up Spark UI's loading: {code:java} spark.history.store.hybridStore.enabled true spark.history.store.hybridStore.maxMemoryUsage 16g {code} Occasionally, we found it took minutes to load a small eventlog which usually took seconds. In the jstack output of SparkHistoryServer, we found that 4 threads were blocked and waiting to lock *org.apache.spark.deploy.history.FsHistoryProvider* object monitor, which was locked by thread "spark-history-task-0" closing a HybridStore. {code:java} "qtp791499503-2688947" #2688947 daemon prio=5 os_prio=0 tid=0x7f4044042800 nid=0x8d98 waiting for monitor entry [0x7f3f6476] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:386) - waiting to lock <0x0004c64433f0> (a org.apache.spark.deploy.history.FsHistoryProvider) at org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:194) at org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:182) at org.apache.spark.deploy.history.ApplicationCache$$Lambda$805/90086258.apply(Unknown Source) at org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:154) at org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:180) at org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:71) at org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:58) at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) at org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) - locked <0x00066effc3e8> (a org.sparkproject.guava.cache.LocalCache$StrongAccessEntry) at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:108) at org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:120) at org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:251) at org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:99) "spark-history-task-0" #49 daemon prio=5 os_prio=0 tid=0x7f431e55b800 nid=0x1ac6 in Object.wait() [0x7f41b2cc9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) - locked <0x00063ccbc9f0> (a java.lang.Thread) at java.lang.Thread.join(Thread.java:1326) at org.apache.spark.deploy.history.HybridStore.close(HybridStore.scala:106) at org.apache.spark.status.AppStatusStore.close(AppStatusStore.scala:553) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1(FsHistoryProvider.scala:913) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1$adapted(FsHistoryProvider.scala:911) at org.apache.spark.deploy.history.FsHistoryProvider$$Lambda$416/229723341.apply(Unknown Source) at scala.Option.foreach(Option.scala:407) at org.apache.spark.deploy.history.FsHistoryProvider.invalidateUI(FsHistoryProvider.scala:911) - locked <0x0004c64433f0> (a org.apache.spark.deploy.history.FsHistoryProvider) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$7(FsHistoryProvider.scala:541) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$7$adapted(FsHistoryProvider.scala:498){code} was: In our SparkHistoryServer, we used these two property to speed up Spark UI's loading: {code:java} spark.history.store.hybridStore.enabled true spark.history.store.hybridStore.maxMemoryUsage 16g {code} Occasionally, we found it took minutes to load a small eventlog which usually took seconds. In the jstack output of SparkHistoryServer, we found that 4 threads were blocked and waiting to lock org.apache.spark.deploy.history.FsHistoryProvider object monitor, which was locked by thread "spark-history-task-0" closing a HybridStore. {code:java} "qtp791499503-2688947" #2688947 daemon prio=5 os_prio=0 tid=0x7f4044042800
[jira] [Updated] (SPARK-46328) Allocate capacity of array list of TColumns by columns size in TRowSet generation
[ https://issues.apache.org/jira/browse/SPARK-46328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bowen Liang updated SPARK-46328: Summary: Allocate capacity of array list of TColumns by columns size in TRowSet generation (was: Allocate capacity by columns size for array list of TColumns in TRowSet generation) > Allocate capacity of array list of TColumns by columns size in TRowSet > generation > - > > Key: SPARK-46328 > URL: https://issues.apache.org/jira/browse/SPARK-46328 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Bowen Liang >Priority: Minor > Labels: pull-request-available > > Background: > ArrayList is created for TColumn value collections in RowSetUtils for TRowSet > generation. Currently, they are created with Java's default capacity of 16, > rather than by the number of columns, which could cause array copying in > assembling each TColumn collections when the column number exceeds the > default capacity. > > Suggested solution: > Allocate enough capacity by columns size for assembling array list of > TColumns in TRowSet generation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46330) Loading of Spark UI blocks for a long time when HybridStore enabled
[ https://issues.apache.org/jira/browse/SPARK-46330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhou Yifan updated SPARK-46330: --- Affects Version/s: 3.1.2 (was: 3.3.1) > Loading of Spark UI blocks for a long time when HybridStore enabled > --- > > Key: SPARK-46330 > URL: https://issues.apache.org/jira/browse/SPARK-46330 > Project: Spark > Issue Type: Bug > Components: UI >Affects Versions: 3.1.2 >Reporter: Zhou Yifan >Priority: Major > > In our SparkHistoryServer, we used these two property to speed up Spark UI's > loading: > {code:java} > spark.history.store.hybridStore.enabled true > spark.history.store.hybridStore.maxMemoryUsage 16g {code} > Occasionally, we found it took minutes to load a small eventlog which usually > took seconds. > In the jstack output of SparkHistoryServer, we found that 4 threads were > blocked and waiting to lock org.apache.spark.deploy.history.FsHistoryProvider > object monitor, which was locked by thread "spark-history-task-0" closing a > HybridStore. > {code:java} > "qtp791499503-2688947" #2688947 daemon prio=5 os_prio=0 > tid=0x7f4044042800 nid=0x8d98 waiting for monitor entry > [0x7f3f6476] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:386) > - waiting to lock <0x0004c64433f0> (a > org.apache.spark.deploy.history.FsHistoryProvider) > at > org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:194) > at > org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:182) > at > org.apache.spark.deploy.history.ApplicationCache$$Lambda$805/90086258.apply(Unknown > Source) > at > org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:154) > at > org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:180) > at > org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:71) > at > org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:58) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > - locked <0x00066effc3e8> (a > org.sparkproject.guava.cache.LocalCache$StrongAccessEntry) > at > org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) > at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:108) > at > org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:120) > at > org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:251) > at > org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:99) > "spark-history-task-0" #49 daemon prio=5 os_prio=0 tid=0x7f431e55b800 > nid=0x1ac6 in Object.wait() [0x7f41b2cc9000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1252) > - locked <0x00063ccbc9f0> (a java.lang.Thread) > at java.lang.Thread.join(Thread.java:1326) > at > org.apache.spark.deploy.history.HybridStore.close(HybridStore.scala:106) > at org.apache.spark.status.AppStatusStore.close(AppStatusStore.scala:553) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1(FsHistoryProvider.scala:913) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1$adapted(FsHistoryProvider.scala:911) > at > org.apache.spark.deploy.history.FsHistoryProvider$$Lambda$416/229723341.apply(Unknown > Source) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.deploy.history.FsHistoryProvider.invalidateUI(FsHistoryProvider.scala:911) > - locked <0x0004c64433f0> (a > org.apache.spark.deploy.history.FsHistoryProvider) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$7(FsHistoryProvider.scala:541) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$7$adapted(FsHistoryProvider.scala:498){code} >
[jira] [Updated] (SPARK-46328) Allocate capacity by columns size for array list of TColumns in TRowSet generation
[ https://issues.apache.org/jira/browse/SPARK-46328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46328: --- Labels: pull-request-available (was: ) > Allocate capacity by columns size for array list of TColumns in TRowSet > generation > -- > > Key: SPARK-46328 > URL: https://issues.apache.org/jira/browse/SPARK-46328 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Bowen Liang >Priority: Minor > Labels: pull-request-available > > Background: > ArrayList is created for TColumn value collections in RowSetUtils for TRowSet > generation. Currently, they are created with Java's default capacity of 16, > rather than by the number of columns, which could cause array copying in > assembling each TColumn collections when the column number exceeds the > default capacity. > > Suggested solution: > Allocate enough capacity by columns size for assembling array list of > TColumns in TRowSet generation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46330) Loading of Spark UI blocks for a long time when HybridStore enabled
[ https://issues.apache.org/jira/browse/SPARK-46330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhou Yifan updated SPARK-46330: --- Description: In our SparkHistoryServer, we used these two property to speed up Spark UI's loading: {code:java} spark.history.store.hybridStore.enabled true spark.history.store.hybridStore.maxMemoryUsage 16g {code} Occasionally, we found it took minutes to load a small eventlog which usually took seconds. In the jstack output of SparkHistoryServer, we found that 4 threads were blocked and waiting to lock org.apache.spark.deploy.history.FsHistoryProvider object monitor, which was locked by thread "spark-history-task-0" closing a HybridStore. {code:java} "qtp791499503-2688947" #2688947 daemon prio=5 os_prio=0 tid=0x7f4044042800 nid=0x8d98 waiting for monitor entry [0x7f3f6476] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:386) - waiting to lock <0x0004c64433f0> (a org.apache.spark.deploy.history.FsHistoryProvider) at org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:194) at org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:182) at org.apache.spark.deploy.history.ApplicationCache$$Lambda$805/90086258.apply(Unknown Source) at org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:154) at org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:180) at org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:71) at org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:58) at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) at org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) - locked <0x00066effc3e8> (a org.sparkproject.guava.cache.LocalCache$StrongAccessEntry) at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:108) at org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:120) at org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:251) at org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:99) "spark-history-task-0" #49 daemon prio=5 os_prio=0 tid=0x7f431e55b800 nid=0x1ac6 in Object.wait() [0x7f41b2cc9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) - locked <0x00063ccbc9f0> (a java.lang.Thread) at java.lang.Thread.join(Thread.java:1326) at org.apache.spark.deploy.history.HybridStore.close(HybridStore.scala:106) at org.apache.spark.status.AppStatusStore.close(AppStatusStore.scala:553) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1(FsHistoryProvider.scala:913) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1$adapted(FsHistoryProvider.scala:911) at org.apache.spark.deploy.history.FsHistoryProvider$$Lambda$416/229723341.apply(Unknown Source) at scala.Option.foreach(Option.scala:407) at org.apache.spark.deploy.history.FsHistoryProvider.invalidateUI(FsHistoryProvider.scala:911) - locked <0x0004c64433f0> (a org.apache.spark.deploy.history.FsHistoryProvider) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$7(FsHistoryProvider.scala:541) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$7$adapted(FsHistoryProvider.scala:498){code} was: In our SparkHistoryServer, we used these two property to speed up Spark UI's loading: {code:java} spark.history.store.hybridStore.enabled true spark.history.store.hybridStore.maxMemoryUsage 16g {code} Occasionally, we found that it took minutes to load a small eventlog which usually took seconds. In the jstack output of SparkHistoryServer, we found that 4 threads were blocked and waiting to lock org.apache.spark.deploy.history.FsHistoryProvider object monitor, which was locked by thread "spark-history-task-0" closing a HybridStore. {code:java} "qtp791499503-2688947" #2688947 daemon prio=5 os_prio=0
[jira] [Created] (SPARK-46329) Spark UI's loading blocks for a long time when HybridStore enabled
Zhou Yifan created SPARK-46329: -- Summary: Spark UI's loading blocks for a long time when HybridStore enabled Key: SPARK-46329 URL: https://issues.apache.org/jira/browse/SPARK-46329 Project: Spark Issue Type: Bug Components: UI Affects Versions: 3.3.1 Reporter: Zhou Yifan In our SparkHistoryServer, we used these two property to speed up Spark UI's loading: {code:java} spark.history.store.hybridStore.enabled true spark.history.store.hybridStore.maxMemoryUsage 16g {code} Occasionally, we found that it took minutes to load a small eventlog which usually took seconds. In the jstack output of SparkHistoryServer, we found that 4 threads were blocked and waiting to lock org.apache.spark.deploy.history.FsHistoryProvider object monitor, which was locked by thread "spark-history-task-0" closing a HybridStore. {code:java} "qtp791499503-2688947" #2688947 daemon prio=5 os_prio=0 tid=0x7f4044042800 nid=0x8d98 waiting for monitor entry [0x7f3f6476] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:386) - waiting to lock <0x0004c64433f0> (a org.apache.spark.deploy.history.FsHistoryProvider) at org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:194) at org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:182) at org.apache.spark.deploy.history.ApplicationCache$$Lambda$805/90086258.apply(Unknown Source) at org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:154) at org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:180) at org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:71) at org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:58) at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) at org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) - locked <0x00066effc3e8> (a org.sparkproject.guava.cache.LocalCache$StrongAccessEntry) at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:108) at org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:120) at org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:251) at org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:99) "spark-history-task-0" #49 daemon prio=5 os_prio=0 tid=0x7f431e55b800 nid=0x1ac6 in Object.wait() [0x7f41b2cc9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) - locked <0x00063ccbc9f0> (a java.lang.Thread) at java.lang.Thread.join(Thread.java:1326) at org.apache.spark.deploy.history.HybridStore.close(HybridStore.scala:106) at org.apache.spark.status.AppStatusStore.close(AppStatusStore.scala:553) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1(FsHistoryProvider.scala:913) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1$adapted(FsHistoryProvider.scala:911) at org.apache.spark.deploy.history.FsHistoryProvider$$Lambda$416/229723341.apply(Unknown Source)at scala.Option.foreach(Option.scala:407) at org.apache.spark.deploy.history.FsHistoryProvider.invalidateUI(FsHistoryProvider.scala:911) - locked <0x0004c64433f0> (a org.apache.spark.deploy.history.FsHistoryProvider) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$7(FsHistoryProvider.scala:541) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$7$adapted(FsHistoryProvider.scala:498){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46330) Spark UI's loading blocks for a long time when HybridStore enabled
Zhou Yifan created SPARK-46330: -- Summary: Spark UI's loading blocks for a long time when HybridStore enabled Key: SPARK-46330 URL: https://issues.apache.org/jira/browse/SPARK-46330 Project: Spark Issue Type: Bug Components: UI Affects Versions: 3.3.1 Reporter: Zhou Yifan In our SparkHistoryServer, we used these two property to speed up Spark UI's loading: {code:java} spark.history.store.hybridStore.enabled true spark.history.store.hybridStore.maxMemoryUsage 16g {code} Occasionally, we found that it took minutes to load a small eventlog which usually took seconds. In the jstack output of SparkHistoryServer, we found that 4 threads were blocked and waiting to lock org.apache.spark.deploy.history.FsHistoryProvider object monitor, which was locked by thread "spark-history-task-0" closing a HybridStore. {code:java} "qtp791499503-2688947" #2688947 daemon prio=5 os_prio=0 tid=0x7f4044042800 nid=0x8d98 waiting for monitor entry [0x7f3f6476] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:386) - waiting to lock <0x0004c64433f0> (a org.apache.spark.deploy.history.FsHistoryProvider) at org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:194) at org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:182) at org.apache.spark.deploy.history.ApplicationCache$$Lambda$805/90086258.apply(Unknown Source) at org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:154) at org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:180) at org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:71) at org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:58) at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) at org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) - locked <0x00066effc3e8> (a org.sparkproject.guava.cache.LocalCache$StrongAccessEntry) at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:108) at org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:120) at org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:251) at org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:99) "spark-history-task-0" #49 daemon prio=5 os_prio=0 tid=0x7f431e55b800 nid=0x1ac6 in Object.wait() [0x7f41b2cc9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) - locked <0x00063ccbc9f0> (a java.lang.Thread) at java.lang.Thread.join(Thread.java:1326) at org.apache.spark.deploy.history.HybridStore.close(HybridStore.scala:106) at org.apache.spark.status.AppStatusStore.close(AppStatusStore.scala:553) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1(FsHistoryProvider.scala:913) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1$adapted(FsHistoryProvider.scala:911) at org.apache.spark.deploy.history.FsHistoryProvider$$Lambda$416/229723341.apply(Unknown Source)at scala.Option.foreach(Option.scala:407) at org.apache.spark.deploy.history.FsHistoryProvider.invalidateUI(FsHistoryProvider.scala:911) - locked <0x0004c64433f0> (a org.apache.spark.deploy.history.FsHistoryProvider) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$7(FsHistoryProvider.scala:541) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$7$adapted(FsHistoryProvider.scala:498){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46330) Loading of Spark UI blocks for a long time when HybridStore enabled
[ https://issues.apache.org/jira/browse/SPARK-46330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhou Yifan updated SPARK-46330: --- Summary: Loading of Spark UI blocks for a long time when HybridStore enabled (was: Spark UI's loading blocks for a long time when HybridStore enabled) > Loading of Spark UI blocks for a long time when HybridStore enabled > --- > > Key: SPARK-46330 > URL: https://issues.apache.org/jira/browse/SPARK-46330 > Project: Spark > Issue Type: Bug > Components: UI >Affects Versions: 3.3.1 >Reporter: Zhou Yifan >Priority: Major > > In our SparkHistoryServer, we used these two property to speed up Spark UI's > loading: > > {code:java} > spark.history.store.hybridStore.enabled true > spark.history.store.hybridStore.maxMemoryUsage 16g {code} > Occasionally, we found that it took minutes to load a small eventlog which > usually took seconds. > In the jstack output of SparkHistoryServer, we found that 4 threads were > blocked and waiting to lock org.apache.spark.deploy.history.FsHistoryProvider > object monitor, which was > locked by thread "spark-history-task-0" closing a HybridStore. > {code:java} > "qtp791499503-2688947" #2688947 daemon prio=5 os_prio=0 > tid=0x7f4044042800 nid=0x8d98 waiting for monitor entry > [0x7f3f6476] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:386) > - waiting to lock <0x0004c64433f0> (a > org.apache.spark.deploy.history.FsHistoryProvider) > at > org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:194) > at > org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:182) > at > org.apache.spark.deploy.history.ApplicationCache$$Lambda$805/90086258.apply(Unknown > Source) > at > org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:154) > at > org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:180) > at > org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:71) > at > org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:58) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > - locked <0x00066effc3e8> (a > org.sparkproject.guava.cache.LocalCache$StrongAccessEntry) > at > org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) > at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:108) > at > org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:120) > at > org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:251) > at > org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:99) > "spark-history-task-0" #49 daemon prio=5 os_prio=0 tid=0x7f431e55b800 > nid=0x1ac6 in Object.wait() [0x7f41b2cc9000] java.lang.Thread.State: > WAITING (on object monitor) at java.lang.Object.wait(Native Method) at > java.lang.Thread.join(Thread.java:1252) - locked <0x00063ccbc9f0> (a > java.lang.Thread) at java.lang.Thread.join(Thread.java:1326) at > org.apache.spark.deploy.history.HybridStore.close(HybridStore.scala:106) > at org.apache.spark.status.AppStatusStore.close(AppStatusStore.scala:553) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1(FsHistoryProvider.scala:913) >at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1$adapted(FsHistoryProvider.scala:911) >at > org.apache.spark.deploy.history.FsHistoryProvider$$Lambda$416/229723341.apply(Unknown > Source)at scala.Option.foreach(Option.scala:407) at > org.apache.spark.deploy.history.FsHistoryProvider.invalidateUI(FsHistoryProvider.scala:911) > - locked <0x0004c64433f0> (a > org.apache.spark.deploy.history.FsHistoryProvider) at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$7(FsHistoryProvider.scala:541) >at >
[jira] [Updated] (SPARK-46328) Allocate capacity by columns size for array list of TColumns in TRowSet generation
[ https://issues.apache.org/jira/browse/SPARK-46328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bowen Liang updated SPARK-46328: Summary: Allocate capacity by columns size for array list of TColumns in TRowSet generation (was: Allocate enough capacity for assembling array list of TColumns in TRowSet generation) > Allocate capacity by columns size for array list of TColumns in TRowSet > generation > -- > > Key: SPARK-46328 > URL: https://issues.apache.org/jira/browse/SPARK-46328 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Bowen Liang >Priority: Minor > > Background: > ArrayList is created for TColumn value collections in RowSetUtils for TRowSet > generation. Currently, they are created with Java's default capacity of 16, > rather than by the number of columns, which could cause array copying in > assembling each TColumn collections when the column number exceeds the > default capacity. > > Suggested solution: > Allocate enough capacity by columns size for assembling array list of > TColumns in TRowSet generation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46328) Allocate enough capacity for assembling array list of TColumns in TRowSet generation
Bowen Liang created SPARK-46328: --- Summary: Allocate enough capacity for assembling array list of TColumns in TRowSet generation Key: SPARK-46328 URL: https://issues.apache.org/jira/browse/SPARK-46328 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.5.0 Reporter: Bowen Liang Background: ArrayList is created for TColumn value collections in RowSetUtils for TRowSet generation. Currently, they are created with Java's default capacity of 16, rather than by the number of columns, which could cause array copying in assembling each TColumn collections when the column number exceeds the default capacity. Suggested solution: Allocate enough capacity by columns size for assembling array list of TColumns in TRowSet generation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46327) Reorganize `SeriesStringTests`
[ https://issues.apache.org/jira/browse/SPARK-46327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46327: --- Labels: pull-request-available (was: ) > Reorganize `SeriesStringTests` > -- > > Key: SPARK-46327 > URL: https://issues.apache.org/jira/browse/SPARK-46327 > Project: Spark > Issue Type: Test > Components: Connect, PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46327) Reorganize `SeriesStringTests`
Ruifeng Zheng created SPARK-46327: - Summary: Reorganize `SeriesStringTests` Key: SPARK-46327 URL: https://issues.apache.org/jira/browse/SPARK-46327 Project: Spark Issue Type: Test Components: Connect, PS, Tests Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46323) Fix the output name of pyspark.sql.functions.now
[ https://issues.apache.org/jira/browse/SPARK-46323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46323: Assignee: Hyukjin Kwon > Fix the output name of pyspark.sql.functions.now > > > Key: SPARK-46323 > URL: https://issues.apache.org/jira/browse/SPARK-46323 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > It returns {{current_timestamp}} now. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46323) Fix the output name of pyspark.sql.functions.now
[ https://issues.apache.org/jira/browse/SPARK-46323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46323. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44252 [https://github.com/apache/spark/pull/44252] > Fix the output name of pyspark.sql.functions.now > > > Key: SPARK-46323 > URL: https://issues.apache.org/jira/browse/SPARK-46323 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > It returns {{current_timestamp}} now. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org