[jira] [Commented] (SPARK-26303) Return partial results for bad JSON records
[ https://issues.apache.org/jira/browse/SPARK-26303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716481#comment-16716481 ] ASF GitHub Bot commented on SPARK-26303: AmplabJenkins removed a comment on issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON records URL: https://github.com/apache/spark/pull/23253#issuecomment-446108177 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99953/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Return partial results for bad JSON records > --- > > Key: SPARK-26303 > URL: https://issues.apache.org/jira/browse/SPARK-26303 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Priority: Minor > Fix For: 3.0.0 > > > Currently, JSON datasource and JSON functions return row with all null for a > malformed JSON string in the PERMISSIVE mode when specified schema has the > struct type. All nulls are returned even some of fields were parsed and > converted to desired types successfully. The ticket aims to solve the problem > by returning already parsed fields. The corrupted column specified via JSON > option `columnNameOfCorruptRecord` or SQL config should contain whole > original JSON string. > For example, if the input has one JSON string: > {code:json} > {"a":0.1,"b":{},"c":"def"} > {code} > and specified schema is: > {code:sql} > a DOUBLE, b ARRAY, c STRING, _corrupt_record STRIN > {code} > expected output of `from_json` in the PERMISSIVE mode: > {code} > +---++---+--+ > |a |b |c |_corrupt_record | > +---++---+--+ > |0.1|null|def|{"a":0.1,"b":{},"c":"def"}| > +---++---+--+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24102) RegressionEvaluator should use sample weight data
[ https://issues.apache.org/jira/browse/SPARK-24102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716483#comment-16716483 ] ASF GitHub Bot commented on SPARK-24102: AmplabJenkins removed a comment on issue #17085: [SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator URL: https://github.com/apache/spark/pull/17085#issuecomment-446108200 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99946/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > RegressionEvaluator should use sample weight data > - > > Key: SPARK-24102 > URL: https://issues.apache.org/jira/browse/SPARK-24102 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.0.2 >Reporter: Ilya Matiach >Priority: Major > Labels: starter > > The LogisticRegression and LinearRegression models support training with a > weight column, but the corresponding evaluators do not support computing > metrics using those weights. This breaks model selection using CrossValidator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26300) The `checkForStreaming` mothod may be called twice in `createQuery`
[ https://issues.apache.org/jira/browse/SPARK-26300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716478#comment-16716478 ] ASF GitHub Bot commented on SPARK-26300: AmplabJenkins removed a comment on issue #23251: [SPARK-26300][SS] Remove a redundant `checkForStreaming` call URL: https://github.com/apache/spark/pull/23251#issuecomment-446108206 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99949/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > The `checkForStreaming` mothod may be called twice in `createQuery` > - > > Key: SPARK-26300 > URL: https://issues.apache.org/jira/browse/SPARK-26300 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: liuxian >Priority: Minor > > If {{checkForContinuous}} is called ( {{checkForStreaming}} is called in > {{checkForContinuous}} ), the {{checkForStreaming}} mothod will be called > twice in {{createQuery}} , this is not necessary, and the > {{checkForStreaming}} method has a lot of statements, so it's better to > remove one of them. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24102) RegressionEvaluator should use sample weight data
[ https://issues.apache.org/jira/browse/SPARK-24102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716455#comment-16716455 ] ASF GitHub Bot commented on SPARK-24102: AmplabJenkins removed a comment on issue #17085: [SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator URL: https://github.com/apache/spark/pull/17085#issuecomment-446108108 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99952/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > RegressionEvaluator should use sample weight data > - > > Key: SPARK-24102 > URL: https://issues.apache.org/jira/browse/SPARK-24102 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.0.2 >Reporter: Ilya Matiach >Priority: Major > Labels: starter > > The LogisticRegression and LinearRegression models support training with a > weight column, but the corresponding evaluators do not support computing > metrics using those weights. This breaks model selection using CrossValidator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26303) Return partial results for bad JSON records
[ https://issues.apache.org/jira/browse/SPARK-26303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716457#comment-16716457 ] ASF GitHub Bot commented on SPARK-26303: AmplabJenkins removed a comment on issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON records URL: https://github.com/apache/spark/pull/23253#issuecomment-446108172 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Return partial results for bad JSON records > --- > > Key: SPARK-26303 > URL: https://issues.apache.org/jira/browse/SPARK-26303 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Priority: Minor > > Currently, JSON datasource and JSON functions return row with all null for a > malformed JSON string in the PERMISSIVE mode when specified schema has the > struct type. All nulls are returned even some of fields were parsed and > converted to desired types successfully. The ticket aims to solve the problem > by returning already parsed fields. The corrupted column specified via JSON > option `columnNameOfCorruptRecord` or SQL config should contain whole > original JSON string. > For example, if the input has one JSON string: > {code:json} > {"a":0.1,"b":{},"c":"def"} > {code} > and specified schema is: > {code:sql} > a DOUBLE, b ARRAY, c STRING, _corrupt_record STRIN > {code} > expected output of `from_json` in the PERMISSIVE mode: > {code} > +---++---+--+ > |a |b |c |_corrupt_record | > +---++---+--+ > |0.1|null|def|{"a":0.1,"b":{},"c":"def"}| > +---++---+--+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26316) Because of the perf degradation in TPC-DS, we currently partial revert SPARK-21052:Add hash map metrics to join,
[ https://issues.apache.org/jira/browse/SPARK-26316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716462#comment-16716462 ] ASF GitHub Bot commented on SPARK-26316: SparkQA removed a comment on issue #23269: [SPARK-26316] Revert hash join metrics in spark 21052 that causes performance degradation URL: https://github.com/apache/spark/pull/23269#issuecomment-446083119 **[Test build #99951 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99951/testReport)** for PR 23269 at commit [`a46d18e`](https://github.com/apache/spark/commit/a46d18e2a6ae822a1e1d903e54ab928096cb2339). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Because of the perf degradation in TPC-DS, we currently partial revert > SPARK-21052:Add hash map metrics to join, > > > Key: SPARK-26316 > URL: https://issues.apache.org/jira/browse/SPARK-26316 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Ke Jia >Priority: Major > > The code of > [L486|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L486] > and > [L487|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L487] > in SPARK-21052 cause performance degradation in spark2.3. The result of > all queries in TPC-DS with 1TB is in [TPC-DS > result|https://docs.google.com/spreadsheets/d/18a5BdOlmm8euTaRodyeWum9yu92mbWWu6JbhGXtr7yE/edit#gid=0] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26303) Return partial results for bad JSON records
[ https://issues.apache.org/jira/browse/SPARK-26303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716464#comment-16716464 ] ASF GitHub Bot commented on SPARK-26303: HyukjinKwon commented on issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON records URL: https://github.com/apache/spark/pull/23253#issuecomment-446108313 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Return partial results for bad JSON records > --- > > Key: SPARK-26303 > URL: https://issues.apache.org/jira/browse/SPARK-26303 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Priority: Minor > > Currently, JSON datasource and JSON functions return row with all null for a > malformed JSON string in the PERMISSIVE mode when specified schema has the > struct type. All nulls are returned even some of fields were parsed and > converted to desired types successfully. The ticket aims to solve the problem > by returning already parsed fields. The corrupted column specified via JSON > option `columnNameOfCorruptRecord` or SQL config should contain whole > original JSON string. > For example, if the input has one JSON string: > {code:json} > {"a":0.1,"b":{},"c":"def"} > {code} > and specified schema is: > {code:sql} > a DOUBLE, b ARRAY, c STRING, _corrupt_record STRIN > {code} > expected output of `from_json` in the PERMISSIVE mode: > {code} > +---++---+--+ > |a |b |c |_corrupt_record | > +---++---+--+ > |0.1|null|def|{"a":0.1,"b":{},"c":"def"}| > +---++---+--+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26300) The `checkForStreaming` mothod may be called twice in `createQuery`
[ https://issues.apache.org/jira/browse/SPARK-26300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716468#comment-16716468 ] ASF GitHub Bot commented on SPARK-26300: AmplabJenkins removed a comment on issue #23251: [SPARK-26300][SS] Remove a redundant `checkForStreaming` call URL: https://github.com/apache/spark/pull/23251#issuecomment-446108201 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > The `checkForStreaming` mothod may be called twice in `createQuery` > - > > Key: SPARK-26300 > URL: https://issues.apache.org/jira/browse/SPARK-26300 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: liuxian >Priority: Minor > > If {{checkForContinuous}} is called ( {{checkForStreaming}} is called in > {{checkForContinuous}} ), the {{checkForStreaming}} mothod will be called > twice in {{createQuery}} , this is not necessary, and the > {{checkForStreaming}} method has a lot of statements, so it's better to > remove one of them. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26265) deadlock between TaskMemoryManager and BytesToBytesMap$MapIterator
[ https://issues.apache.org/jira/browse/SPARK-26265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716471#comment-16716471 ] ASF GitHub Bot commented on SPARK-26265: AmplabJenkins commented on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager URL: https://github.com/apache/spark/pull/23272#issuecomment-446108381 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5959/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > deadlock between TaskMemoryManager and BytesToBytesMap$MapIterator > -- > > Key: SPARK-26265 > URL: https://issues.apache.org/jira/browse/SPARK-26265 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: qian han >Priority: Major > > The application is running on a cluster with 72000 cores and 182000G mem. > Enviroment: > |spark.dynamicAllocation.minExecutors|5| > |spark.dynamicAllocation.initialExecutors|30| > |spark.dynamicAllocation.maxExecutors|400| > |spark.executor.cores|4| > |spark.executor.memory|20g| > > > Stage description: > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:364) > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:357) > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:193) > > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:498) > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) > org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > jstack information as follow: > Found one Java-level deadlock: = > "Thread-ScriptTransformation-Feed": waiting to lock monitor > 0x00e0cb18 (object 0x0002f1641538, a > org.apache.spark.memory.TaskMemoryManager), which is held by "Executor task > launch worker for task 18899" "Executor task launch worker for task 18899": > waiting to lock monitor 0x00e09788 (object 0x000302faa3b0, a > org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator), which is held by > "Thread-ScriptTransformation-Feed" Java stack information for the threads > listed above: === > "Thread-ScriptTransformation-Feed": at > org.apache.spark.memory.TaskMemoryManager.freePage(TaskMemoryManager.java:332) > - waiting to lock <0x0002f1641538> (a > org.apache.spark.memory.TaskMemoryManager) at > org.apache.spark.memory.MemoryConsumer.freePage(MemoryConsumer.java:130) at > org.apache.spark.unsafe.map.BytesToBytesMap.access$300(BytesToBytesMap.java:66) > at > org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator.advanceToNextPage(BytesToBytesMap.java:274) > - locked <0x000302faa3b0> (a > org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator) at > org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator.next(BytesToBytesMap.java:313) > at > org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap$1.next(UnsafeFixedWidthAggregationMap.java:173) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown > Source) at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at > scala.collection.Iterator$class.foreach(Iterator.scala:893) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at >
[jira] [Commented] (SPARK-26262) Runs SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and CODEGEN_FACTORY_MODE
[ https://issues.apache.org/jira/browse/SPARK-26262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716459#comment-16716459 ] ASF GitHub Bot commented on SPARK-26262: SparkQA removed a comment on issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and CODEGEN_FACTORY_MODE URL: https://github.com/apache/spark/pull/23213#issuecomment-446069680 **[Test build #99945 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99945/testReport)** for PR 23213 at commit [`a9c108f`](https://github.com/apache/spark/commit/a9c108fa090b847d48848cf6d679aa6747dcc534). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Runs SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and > CODEGEN_FACTORY_MODE > > > Key: SPARK-26262 > URL: https://issues.apache.org/jira/browse/SPARK-26262 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Takeshi Yamamuro >Priority: Minor > > For better test coverage, we need to run `SQLQueryTestSuite` on 4 mixed > config sets: > 1. WHOLESTAGE_CODEGEN_ENABLED=true, CODEGEN_FACTORY_MODE=CODEGEN_ONLY > 2. WHOLESTAGE_CODEGEN_ENABLED=false, CODEGEN_FACTORY_MODE=CODEGEN_ONLY > 3. WHOLESTAGE_CODEGEN_ENABLED=true, CODEGEN_FACTORY_MODE=NO_CODEGEN > 4. WHOLESTAGE_CODEGEN_ENABLED=false, CODEGEN_FACTORY_MODE=NO_CODEGEN -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26300) The `checkForStreaming` mothod may be called twice in `createQuery`
[ https://issues.apache.org/jira/browse/SPARK-26300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716467#comment-16716467 ] ASF GitHub Bot commented on SPARK-26300: SparkQA removed a comment on issue #23251: [SPARK-26300][SS] Remove a redundant `checkForStreaming` call URL: https://github.com/apache/spark/pull/23251#issuecomment-446081221 **[Test build #99949 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99949/testReport)** for PR 23251 at commit [`b1e71ee`](https://github.com/apache/spark/commit/b1e71ee7a723d63f1cf3c0754f2372eb185439d3). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > The `checkForStreaming` mothod may be called twice in `createQuery` > - > > Key: SPARK-26300 > URL: https://issues.apache.org/jira/browse/SPARK-26300 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: liuxian >Priority: Minor > > If {{checkForContinuous}} is called ( {{checkForStreaming}} is called in > {{checkForContinuous}} ), the {{checkForStreaming}} mothod will be called > twice in {{createQuery}} , this is not necessary, and the > {{checkForStreaming}} method has a lot of statements, so it's better to > remove one of them. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26265) deadlock between TaskMemoryManager and BytesToBytesMap$MapIterator
[ https://issues.apache.org/jira/browse/SPARK-26265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716470#comment-16716470 ] ASF GitHub Bot commented on SPARK-26265: AmplabJenkins commented on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager URL: https://github.com/apache/spark/pull/23272#issuecomment-446108377 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > deadlock between TaskMemoryManager and BytesToBytesMap$MapIterator > -- > > Key: SPARK-26265 > URL: https://issues.apache.org/jira/browse/SPARK-26265 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: qian han >Priority: Major > > The application is running on a cluster with 72000 cores and 182000G mem. > Enviroment: > |spark.dynamicAllocation.minExecutors|5| > |spark.dynamicAllocation.initialExecutors|30| > |spark.dynamicAllocation.maxExecutors|400| > |spark.executor.cores|4| > |spark.executor.memory|20g| > > > Stage description: > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:364) > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:357) > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:193) > > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:498) > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) > org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > jstack information as follow: > Found one Java-level deadlock: = > "Thread-ScriptTransformation-Feed": waiting to lock monitor > 0x00e0cb18 (object 0x0002f1641538, a > org.apache.spark.memory.TaskMemoryManager), which is held by "Executor task > launch worker for task 18899" "Executor task launch worker for task 18899": > waiting to lock monitor 0x00e09788 (object 0x000302faa3b0, a > org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator), which is held by > "Thread-ScriptTransformation-Feed" Java stack information for the threads > listed above: === > "Thread-ScriptTransformation-Feed": at > org.apache.spark.memory.TaskMemoryManager.freePage(TaskMemoryManager.java:332) > - waiting to lock <0x0002f1641538> (a > org.apache.spark.memory.TaskMemoryManager) at > org.apache.spark.memory.MemoryConsumer.freePage(MemoryConsumer.java:130) at > org.apache.spark.unsafe.map.BytesToBytesMap.access$300(BytesToBytesMap.java:66) > at > org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator.advanceToNextPage(BytesToBytesMap.java:274) > - locked <0x000302faa3b0> (a > org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator) at > org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator.next(BytesToBytesMap.java:313) > at > org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap$1.next(UnsafeFixedWidthAggregationMap.java:173) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown > Source) at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at > scala.collection.Iterator$class.foreach(Iterator.scala:893) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at > org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply$mcV$sp(ScriptTransformationExec.scala:281) > at >
[jira] [Commented] (SPARK-26316) Because of the perf degradation in TPC-DS, we currently partial revert SPARK-21052:Add hash map metrics to join,
[ https://issues.apache.org/jira/browse/SPARK-26316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716463#comment-16716463 ] ASF GitHub Bot commented on SPARK-26316: AmplabJenkins removed a comment on issue #23269: [SPARK-26316] Revert hash join metrics in spark 21052 that causes performance degradation URL: https://github.com/apache/spark/pull/23269#issuecomment-446108181 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Because of the perf degradation in TPC-DS, we currently partial revert > SPARK-21052:Add hash map metrics to join, > > > Key: SPARK-26316 > URL: https://issues.apache.org/jira/browse/SPARK-26316 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Ke Jia >Priority: Major > > The code of > [L486|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L486] > and > [L487|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L487] > in SPARK-21052 cause performance degradation in spark2.3. The result of > all queries in TPC-DS with 1TB is in [TPC-DS > result|https://docs.google.com/spreadsheets/d/18a5BdOlmm8euTaRodyeWum9yu92mbWWu6JbhGXtr7yE/edit#gid=0] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24102) RegressionEvaluator should use sample weight data
[ https://issues.apache.org/jira/browse/SPARK-24102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716456#comment-16716456 ] ASF GitHub Bot commented on SPARK-24102: AmplabJenkins removed a comment on issue #17085: [SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator URL: https://github.com/apache/spark/pull/17085#issuecomment-446108099 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > RegressionEvaluator should use sample weight data > - > > Key: SPARK-24102 > URL: https://issues.apache.org/jira/browse/SPARK-24102 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.0.2 >Reporter: Ilya Matiach >Priority: Major > Labels: starter > > The LogisticRegression and LinearRegression models support training with a > weight column, but the corresponding evaluators do not support computing > metrics using those weights. This breaks model selection using CrossValidator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26098) Show associated SQL query in Job page
[ https://issues.apache.org/jira/browse/SPARK-26098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716465#comment-16716465 ] ASF GitHub Bot commented on SPARK-26098: SparkQA removed a comment on issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page URL: https://github.com/apache/spark/pull/23068#issuecomment-446094215 **[Test build #99954 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99954/testReport)** for PR 23068 at commit [`0a63604`](https://github.com/apache/spark/commit/0a636049ecc721cdd31cd676fce79aeb6582dd7c). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Show associated SQL query in Job page > - > > Key: SPARK-26098 > URL: https://issues.apache.org/jira/browse/SPARK-26098 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Priority: Major > > For jobs associated to SQL queries, it would be easier to understand the > context to showing the SQL query in Job detail page. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24102) RegressionEvaluator should use sample weight data
[ https://issues.apache.org/jira/browse/SPARK-24102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716458#comment-16716458 ] ASF GitHub Bot commented on SPARK-24102: SparkQA removed a comment on issue #17085: [SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator URL: https://github.com/apache/spark/pull/17085#issuecomment-446083138 **[Test build #99952 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99952/testReport)** for PR 17085 at commit [`0cb2daf`](https://github.com/apache/spark/commit/0cb2daf35888d80c5c223e16505354571d87d383). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > RegressionEvaluator should use sample weight data > - > > Key: SPARK-24102 > URL: https://issues.apache.org/jira/browse/SPARK-24102 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.0.2 >Reporter: Ilya Matiach >Priority: Major > Labels: starter > > The LogisticRegression and LinearRegression models support training with a > weight column, but the corresponding evaluators do not support computing > metrics using those weights. This breaks model selection using CrossValidator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26265) deadlock between TaskMemoryManager and BytesToBytesMap$MapIterator
[ https://issues.apache.org/jira/browse/SPARK-26265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716469#comment-16716469 ] ASF GitHub Bot commented on SPARK-26265: SparkQA commented on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager URL: https://github.com/apache/spark/pull/23272#issuecomment-446108367 **[Test build #99956 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99956/testReport)** for PR 23272 at commit [`0405527`](https://github.com/apache/spark/commit/04055278a02800c6d3ac67ddb2d9acc2c3baa18d). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > deadlock between TaskMemoryManager and BytesToBytesMap$MapIterator > -- > > Key: SPARK-26265 > URL: https://issues.apache.org/jira/browse/SPARK-26265 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: qian han >Priority: Major > > The application is running on a cluster with 72000 cores and 182000G mem. > Enviroment: > |spark.dynamicAllocation.minExecutors|5| > |spark.dynamicAllocation.initialExecutors|30| > |spark.dynamicAllocation.maxExecutors|400| > |spark.executor.cores|4| > |spark.executor.memory|20g| > > > Stage description: > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:364) > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:357) > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:193) > > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:498) > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) > org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > jstack information as follow: > Found one Java-level deadlock: = > "Thread-ScriptTransformation-Feed": waiting to lock monitor > 0x00e0cb18 (object 0x0002f1641538, a > org.apache.spark.memory.TaskMemoryManager), which is held by "Executor task > launch worker for task 18899" "Executor task launch worker for task 18899": > waiting to lock monitor 0x00e09788 (object 0x000302faa3b0, a > org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator), which is held by > "Thread-ScriptTransformation-Feed" Java stack information for the threads > listed above: === > "Thread-ScriptTransformation-Feed": at > org.apache.spark.memory.TaskMemoryManager.freePage(TaskMemoryManager.java:332) > - waiting to lock <0x0002f1641538> (a > org.apache.spark.memory.TaskMemoryManager) at > org.apache.spark.memory.MemoryConsumer.freePage(MemoryConsumer.java:130) at > org.apache.spark.unsafe.map.BytesToBytesMap.access$300(BytesToBytesMap.java:66) > at > org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator.advanceToNextPage(BytesToBytesMap.java:274) > - locked <0x000302faa3b0> (a > org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator) at > org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator.next(BytesToBytesMap.java:313) > at > org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap$1.next(UnsafeFixedWidthAggregationMap.java:173) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown > Source) at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at > scala.collection.Iterator$class.foreach(Iterator.scala:893) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at >
[jira] [Commented] (SPARK-26262) Runs SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and CODEGEN_FACTORY_MODE
[ https://issues.apache.org/jira/browse/SPARK-26262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716473#comment-16716473 ] ASF GitHub Bot commented on SPARK-26262: AmplabJenkins commented on issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and CODEGEN_FACTORY_MODE URL: https://github.com/apache/spark/pull/23213#issuecomment-446108440 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99945/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Runs SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and > CODEGEN_FACTORY_MODE > > > Key: SPARK-26262 > URL: https://issues.apache.org/jira/browse/SPARK-26262 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Takeshi Yamamuro >Priority: Minor > > For better test coverage, we need to run `SQLQueryTestSuite` on 4 mixed > config sets: > 1. WHOLESTAGE_CODEGEN_ENABLED=true, CODEGEN_FACTORY_MODE=CODEGEN_ONLY > 2. WHOLESTAGE_CODEGEN_ENABLED=false, CODEGEN_FACTORY_MODE=CODEGEN_ONLY > 3. WHOLESTAGE_CODEGEN_ENABLED=true, CODEGEN_FACTORY_MODE=NO_CODEGEN > 4. WHOLESTAGE_CODEGEN_ENABLED=false, CODEGEN_FACTORY_MODE=NO_CODEGEN -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26098) Show associated SQL query in Job page
[ https://issues.apache.org/jira/browse/SPARK-26098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716466#comment-16716466 ] ASF GitHub Bot commented on SPARK-26098: AmplabJenkins removed a comment on issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page URL: https://github.com/apache/spark/pull/23068#issuecomment-446108049 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Show associated SQL query in Job page > - > > Key: SPARK-26098 > URL: https://issues.apache.org/jira/browse/SPARK-26098 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Priority: Major > > For jobs associated to SQL queries, it would be easier to understand the > context to showing the SQL query in Job detail page. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26262) Runs SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and CODEGEN_FACTORY_MODE
[ https://issues.apache.org/jira/browse/SPARK-26262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716472#comment-16716472 ] ASF GitHub Bot commented on SPARK-26262: AmplabJenkins commented on issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and CODEGEN_FACTORY_MODE URL: https://github.com/apache/spark/pull/23213#issuecomment-446108433 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Runs SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and > CODEGEN_FACTORY_MODE > > > Key: SPARK-26262 > URL: https://issues.apache.org/jira/browse/SPARK-26262 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Takeshi Yamamuro >Priority: Minor > > For better test coverage, we need to run `SQLQueryTestSuite` on 4 mixed > config sets: > 1. WHOLESTAGE_CODEGEN_ENABLED=true, CODEGEN_FACTORY_MODE=CODEGEN_ONLY > 2. WHOLESTAGE_CODEGEN_ENABLED=false, CODEGEN_FACTORY_MODE=CODEGEN_ONLY > 3. WHOLESTAGE_CODEGEN_ENABLED=true, CODEGEN_FACTORY_MODE=NO_CODEGEN > 4. WHOLESTAGE_CODEGEN_ENABLED=false, CODEGEN_FACTORY_MODE=NO_CODEGEN -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26300) The `checkForStreaming` mothod may be called twice in `createQuery`
[ https://issues.apache.org/jira/browse/SPARK-26300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716439#comment-16716439 ] ASF GitHub Bot commented on SPARK-26300: SparkQA commented on issue #23251: [SPARK-26300][SS] Remove a redundant `checkForStreaming` call URL: https://github.com/apache/spark/pull/23251#issuecomment-446108012 **[Test build #99949 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99949/testReport)** for PR 23251 at commit [`b1e71ee`](https://github.com/apache/spark/commit/b1e71ee7a723d63f1cf3c0754f2372eb185439d3). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > The `checkForStreaming` mothod may be called twice in `createQuery` > - > > Key: SPARK-26300 > URL: https://issues.apache.org/jira/browse/SPARK-26300 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: liuxian >Priority: Minor > > If {{checkForContinuous}} is called ( {{checkForStreaming}} is called in > {{checkForContinuous}} ), the {{checkForStreaming}} mothod will be called > twice in {{createQuery}} , this is not necessary, and the > {{checkForStreaming}} method has a lot of statements, so it's better to > remove one of them. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26262) Runs SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and CODEGEN_FACTORY_MODE
[ https://issues.apache.org/jira/browse/SPARK-26262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716440#comment-16716440 ] ASF GitHub Bot commented on SPARK-26262: SparkQA commented on issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and CODEGEN_FACTORY_MODE URL: https://github.com/apache/spark/pull/23213#issuecomment-446108016 **[Test build #99945 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99945/testReport)** for PR 23213 at commit [`a9c108f`](https://github.com/apache/spark/commit/a9c108fa090b847d48848cf6d679aa6747dcc534). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Runs SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and > CODEGEN_FACTORY_MODE > > > Key: SPARK-26262 > URL: https://issues.apache.org/jira/browse/SPARK-26262 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Takeshi Yamamuro >Priority: Minor > > For better test coverage, we need to run `SQLQueryTestSuite` on 4 mixed > config sets: > 1. WHOLESTAGE_CODEGEN_ENABLED=true, CODEGEN_FACTORY_MODE=CODEGEN_ONLY > 2. WHOLESTAGE_CODEGEN_ENABLED=false, CODEGEN_FACTORY_MODE=CODEGEN_ONLY > 3. WHOLESTAGE_CODEGEN_ENABLED=true, CODEGEN_FACTORY_MODE=NO_CODEGEN > 4. WHOLESTAGE_CODEGEN_ENABLED=false, CODEGEN_FACTORY_MODE=NO_CODEGEN -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24102) RegressionEvaluator should use sample weight data
[ https://issues.apache.org/jira/browse/SPARK-24102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716452#comment-16716452 ] ASF GitHub Bot commented on SPARK-24102: AmplabJenkins commented on issue #17085: [SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator URL: https://github.com/apache/spark/pull/17085#issuecomment-446108200 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99946/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > RegressionEvaluator should use sample weight data > - > > Key: SPARK-24102 > URL: https://issues.apache.org/jira/browse/SPARK-24102 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.0.2 >Reporter: Ilya Matiach >Priority: Major > Labels: starter > > The LogisticRegression and LinearRegression models support training with a > weight column, but the corresponding evaluators do not support computing > metrics using those weights. This breaks model selection using CrossValidator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24102) RegressionEvaluator should use sample weight data
[ https://issues.apache.org/jira/browse/SPARK-24102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716437#comment-16716437 ] ASF GitHub Bot commented on SPARK-24102: SparkQA commented on issue #17085: [SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator URL: https://github.com/apache/spark/pull/17085#issuecomment-446108009 **[Test build #99946 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99946/testReport)** for PR 17085 at commit [`aca6255`](https://github.com/apache/spark/commit/aca62557fe394d500bd084ad840f9c0ff352cde3). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > RegressionEvaluator should use sample weight data > - > > Key: SPARK-24102 > URL: https://issues.apache.org/jira/browse/SPARK-24102 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.0.2 >Reporter: Ilya Matiach >Priority: Major > Labels: starter > > The LogisticRegression and LinearRegression models support training with a > weight column, but the corresponding evaluators do not support computing > metrics using those weights. This breaks model selection using CrossValidator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26098) Show associated SQL query in Job page
[ https://issues.apache.org/jira/browse/SPARK-26098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716436#comment-16716436 ] ASF GitHub Bot commented on SPARK-26098: SparkQA commented on issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page URL: https://github.com/apache/spark/pull/23068#issuecomment-446108007 **[Test build #99954 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99954/testReport)** for PR 23068 at commit [`0a63604`](https://github.com/apache/spark/commit/0a636049ecc721cdd31cd676fce79aeb6582dd7c). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Show associated SQL query in Job page > - > > Key: SPARK-26098 > URL: https://issues.apache.org/jira/browse/SPARK-26098 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Priority: Major > > For jobs associated to SQL queries, it would be easier to understand the > context to showing the SQL query in Job detail page. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26303) Return partial results for bad JSON records
[ https://issues.apache.org/jira/browse/SPARK-26303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716448#comment-16716448 ] ASF GitHub Bot commented on SPARK-26303: AmplabJenkins commented on issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON records URL: https://github.com/apache/spark/pull/23253#issuecomment-446108177 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99953/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Return partial results for bad JSON records > --- > > Key: SPARK-26303 > URL: https://issues.apache.org/jira/browse/SPARK-26303 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Priority: Minor > > Currently, JSON datasource and JSON functions return row with all null for a > malformed JSON string in the PERMISSIVE mode when specified schema has the > struct type. All nulls are returned even some of fields were parsed and > converted to desired types successfully. The ticket aims to solve the problem > by returning already parsed fields. The corrupted column specified via JSON > option `columnNameOfCorruptRecord` or SQL config should contain whole > original JSON string. > For example, if the input has one JSON string: > {code:json} > {"a":0.1,"b":{},"c":"def"} > {code} > and specified schema is: > {code:sql} > a DOUBLE, b ARRAY, c STRING, _corrupt_record STRIN > {code} > expected output of `from_json` in the PERMISSIVE mode: > {code} > +---++---+--+ > |a |b |c |_corrupt_record | > +---++---+--+ > |0.1|null|def|{"a":0.1,"b":{},"c":"def"}| > +---++---+--+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26316) Because of the perf degradation in TPC-DS, we currently partial revert SPARK-21052:Add hash map metrics to join,
[ https://issues.apache.org/jira/browse/SPARK-26316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716441#comment-16716441 ] ASF GitHub Bot commented on SPARK-26316: SparkQA commented on issue #23269: [SPARK-26316] Revert hash join metrics in spark 21052 that causes performance degradation URL: https://github.com/apache/spark/pull/23269#issuecomment-446108017 **[Test build #99951 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99951/testReport)** for PR 23269 at commit [`a46d18e`](https://github.com/apache/spark/commit/a46d18e2a6ae822a1e1d903e54ab928096cb2339). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Because of the perf degradation in TPC-DS, we currently partial revert > SPARK-21052:Add hash map metrics to join, > > > Key: SPARK-26316 > URL: https://issues.apache.org/jira/browse/SPARK-26316 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Ke Jia >Priority: Major > > The code of > [L486|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L486] > and > [L487|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L487] > in SPARK-21052 cause performance degradation in spark2.3. The result of > all queries in TPC-DS with 1TB is in [TPC-DS > result|https://docs.google.com/spreadsheets/d/18a5BdOlmm8euTaRodyeWum9yu92mbWWu6JbhGXtr7yE/edit#gid=0] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26300) The `checkForStreaming` mothod may be called twice in `createQuery`
[ https://issues.apache.org/jira/browse/SPARK-26300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716454#comment-16716454 ] ASF GitHub Bot commented on SPARK-26300: AmplabJenkins commented on issue #23251: [SPARK-26300][SS] Remove a redundant `checkForStreaming` call URL: https://github.com/apache/spark/pull/23251#issuecomment-446108206 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99949/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > The `checkForStreaming` mothod may be called twice in `createQuery` > - > > Key: SPARK-26300 > URL: https://issues.apache.org/jira/browse/SPARK-26300 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: liuxian >Priority: Minor > > If {{checkForContinuous}} is called ( {{checkForStreaming}} is called in > {{checkForContinuous}} ), the {{checkForStreaming}} mothod will be called > twice in {{createQuery}} , this is not necessary, and the > {{checkForStreaming}} method has a lot of statements, so it's better to > remove one of them. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26098) Show associated SQL query in Job page
[ https://issues.apache.org/jira/browse/SPARK-26098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716443#comment-16716443 ] ASF GitHub Bot commented on SPARK-26098: AmplabJenkins commented on issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page URL: https://github.com/apache/spark/pull/23068#issuecomment-446108049 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Show associated SQL query in Job page > - > > Key: SPARK-26098 > URL: https://issues.apache.org/jira/browse/SPARK-26098 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Priority: Major > > For jobs associated to SQL queries, it would be easier to understand the > context to showing the SQL query in Job detail page. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24102) RegressionEvaluator should use sample weight data
[ https://issues.apache.org/jira/browse/SPARK-24102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716451#comment-16716451 ] ASF GitHub Bot commented on SPARK-24102: AmplabJenkins commented on issue #17085: [SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator URL: https://github.com/apache/spark/pull/17085#issuecomment-446108193 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > RegressionEvaluator should use sample weight data > - > > Key: SPARK-24102 > URL: https://issues.apache.org/jira/browse/SPARK-24102 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.0.2 >Reporter: Ilya Matiach >Priority: Major > Labels: starter > > The LogisticRegression and LinearRegression models support training with a > weight column, but the corresponding evaluators do not support computing > metrics using those weights. This breaks model selection using CrossValidator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26098) Show associated SQL query in Job page
[ https://issues.apache.org/jira/browse/SPARK-26098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716444#comment-16716444 ] ASF GitHub Bot commented on SPARK-26098: AmplabJenkins commented on issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page URL: https://github.com/apache/spark/pull/23068#issuecomment-446108052 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99954/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Show associated SQL query in Job page > - > > Key: SPARK-26098 > URL: https://issues.apache.org/jira/browse/SPARK-26098 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Priority: Major > > For jobs associated to SQL queries, it would be easier to understand the > context to showing the SQL query in Job detail page. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24102) RegressionEvaluator should use sample weight data
[ https://issues.apache.org/jira/browse/SPARK-24102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716446#comment-16716446 ] ASF GitHub Bot commented on SPARK-24102: AmplabJenkins commented on issue #17085: [SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator URL: https://github.com/apache/spark/pull/17085#issuecomment-446108108 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99952/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > RegressionEvaluator should use sample weight data > - > > Key: SPARK-24102 > URL: https://issues.apache.org/jira/browse/SPARK-24102 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.0.2 >Reporter: Ilya Matiach >Priority: Major > Labels: starter > > The LogisticRegression and LinearRegression models support training with a > weight column, but the corresponding evaluators do not support computing > metrics using those weights. This breaks model selection using CrossValidator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24102) RegressionEvaluator should use sample weight data
[ https://issues.apache.org/jira/browse/SPARK-24102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716438#comment-16716438 ] ASF GitHub Bot commented on SPARK-24102: SparkQA commented on issue #17085: [SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator URL: https://github.com/apache/spark/pull/17085#issuecomment-446108008 **[Test build #99952 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99952/testReport)** for PR 17085 at commit [`0cb2daf`](https://github.com/apache/spark/commit/0cb2daf35888d80c5c223e16505354571d87d383). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > RegressionEvaluator should use sample weight data > - > > Key: SPARK-24102 > URL: https://issues.apache.org/jira/browse/SPARK-24102 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.0.2 >Reporter: Ilya Matiach >Priority: Major > Labels: starter > > The LogisticRegression and LinearRegression models support training with a > weight column, but the corresponding evaluators do not support computing > metrics using those weights. This breaks model selection using CrossValidator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26300) The `checkForStreaming` mothod may be called twice in `createQuery`
[ https://issues.apache.org/jira/browse/SPARK-26300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716453#comment-16716453 ] ASF GitHub Bot commented on SPARK-26300: AmplabJenkins commented on issue #23251: [SPARK-26300][SS] Remove a redundant `checkForStreaming` call URL: https://github.com/apache/spark/pull/23251#issuecomment-446108201 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > The `checkForStreaming` mothod may be called twice in `createQuery` > - > > Key: SPARK-26300 > URL: https://issues.apache.org/jira/browse/SPARK-26300 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: liuxian >Priority: Minor > > If {{checkForContinuous}} is called ( {{checkForStreaming}} is called in > {{checkForContinuous}} ), the {{checkForStreaming}} mothod will be called > twice in {{createQuery}} , this is not necessary, and the > {{checkForStreaming}} method has a lot of statements, so it's better to > remove one of them. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26303) Return partial results for bad JSON records
[ https://issues.apache.org/jira/browse/SPARK-26303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716435#comment-16716435 ] ASF GitHub Bot commented on SPARK-26303: SparkQA commented on issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON records URL: https://github.com/apache/spark/pull/23253#issuecomment-446108006 **[Test build #99953 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99953/testReport)** for PR 23253 at commit [`9ca9248`](https://github.com/apache/spark/commit/9ca9248ed3f9314747c1415bd19760c53019bf36). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class PartialResultException(` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Return partial results for bad JSON records > --- > > Key: SPARK-26303 > URL: https://issues.apache.org/jira/browse/SPARK-26303 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Priority: Minor > > Currently, JSON datasource and JSON functions return row with all null for a > malformed JSON string in the PERMISSIVE mode when specified schema has the > struct type. All nulls are returned even some of fields were parsed and > converted to desired types successfully. The ticket aims to solve the problem > by returning already parsed fields. The corrupted column specified via JSON > option `columnNameOfCorruptRecord` or SQL config should contain whole > original JSON string. > For example, if the input has one JSON string: > {code:json} > {"a":0.1,"b":{},"c":"def"} > {code} > and specified schema is: > {code:sql} > a DOUBLE, b ARRAY, c STRING, _corrupt_record STRIN > {code} > expected output of `from_json` in the PERMISSIVE mode: > {code} > +---++---+--+ > |a |b |c |_corrupt_record | > +---++---+--+ > |0.1|null|def|{"a":0.1,"b":{},"c":"def"}| > +---++---+--+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26303) Return partial results for bad JSON records
[ https://issues.apache.org/jira/browse/SPARK-26303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716447#comment-16716447 ] ASF GitHub Bot commented on SPARK-26303: AmplabJenkins commented on issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON records URL: https://github.com/apache/spark/pull/23253#issuecomment-446108172 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Return partial results for bad JSON records > --- > > Key: SPARK-26303 > URL: https://issues.apache.org/jira/browse/SPARK-26303 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Priority: Minor > > Currently, JSON datasource and JSON functions return row with all null for a > malformed JSON string in the PERMISSIVE mode when specified schema has the > struct type. All nulls are returned even some of fields were parsed and > converted to desired types successfully. The ticket aims to solve the problem > by returning already parsed fields. The corrupted column specified via JSON > option `columnNameOfCorruptRecord` or SQL config should contain whole > original JSON string. > For example, if the input has one JSON string: > {code:json} > {"a":0.1,"b":{},"c":"def"} > {code} > and specified schema is: > {code:sql} > a DOUBLE, b ARRAY, c STRING, _corrupt_record STRIN > {code} > expected output of `from_json` in the PERMISSIVE mode: > {code} > +---++---+--+ > |a |b |c |_corrupt_record | > +---++---+--+ > |0.1|null|def|{"a":0.1,"b":{},"c":"def"}| > +---++---+--+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24102) RegressionEvaluator should use sample weight data
[ https://issues.apache.org/jira/browse/SPARK-24102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716445#comment-16716445 ] ASF GitHub Bot commented on SPARK-24102: AmplabJenkins commented on issue #17085: [SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator URL: https://github.com/apache/spark/pull/17085#issuecomment-446108099 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > RegressionEvaluator should use sample weight data > - > > Key: SPARK-24102 > URL: https://issues.apache.org/jira/browse/SPARK-24102 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.0.2 >Reporter: Ilya Matiach >Priority: Major > Labels: starter > > The LogisticRegression and LinearRegression models support training with a > weight column, but the corresponding evaluators do not support computing > metrics using those weights. This breaks model selection using CrossValidator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26316) Because of the perf degradation in TPC-DS, we currently partial revert SPARK-21052:Add hash map metrics to join,
[ https://issues.apache.org/jira/browse/SPARK-26316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716450#comment-16716450 ] ASF GitHub Bot commented on SPARK-26316: AmplabJenkins commented on issue #23269: [SPARK-26316] Revert hash join metrics in spark 21052 that causes performance degradation URL: https://github.com/apache/spark/pull/23269#issuecomment-446108187 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99951/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Because of the perf degradation in TPC-DS, we currently partial revert > SPARK-21052:Add hash map metrics to join, > > > Key: SPARK-26316 > URL: https://issues.apache.org/jira/browse/SPARK-26316 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Ke Jia >Priority: Major > > The code of > [L486|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L486] > and > [L487|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L487] > in SPARK-21052 cause performance degradation in spark2.3. The result of > all queries in TPC-DS with 1TB is in [TPC-DS > result|https://docs.google.com/spreadsheets/d/18a5BdOlmm8euTaRodyeWum9yu92mbWWu6JbhGXtr7yE/edit#gid=0] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26316) Because of the perf degradation in TPC-DS, we currently partial revert SPARK-21052:Add hash map metrics to join,
[ https://issues.apache.org/jira/browse/SPARK-26316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716449#comment-16716449 ] ASF GitHub Bot commented on SPARK-26316: AmplabJenkins commented on issue #23269: [SPARK-26316] Revert hash join metrics in spark 21052 that causes performance degradation URL: https://github.com/apache/spark/pull/23269#issuecomment-446108181 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Because of the perf degradation in TPC-DS, we currently partial revert > SPARK-21052:Add hash map metrics to join, > > > Key: SPARK-26316 > URL: https://issues.apache.org/jira/browse/SPARK-26316 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Ke Jia >Priority: Major > > The code of > [L486|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L486] > and > [L487|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L487] > in SPARK-21052 cause performance degradation in spark2.3. The result of > all queries in TPC-DS with 1TB is in [TPC-DS > result|https://docs.google.com/spreadsheets/d/18a5BdOlmm8euTaRodyeWum9yu92mbWWu6JbhGXtr7yE/edit#gid=0] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26303) Return partial results for bad JSON records
[ https://issues.apache.org/jira/browse/SPARK-26303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716442#comment-16716442 ] ASF GitHub Bot commented on SPARK-26303: SparkQA removed a comment on issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON records URL: https://github.com/apache/spark/pull/23253#issuecomment-446084058 **[Test build #99953 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99953/testReport)** for PR 23253 at commit [`9ca9248`](https://github.com/apache/spark/commit/9ca9248ed3f9314747c1415bd19760c53019bf36). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Return partial results for bad JSON records > --- > > Key: SPARK-26303 > URL: https://issues.apache.org/jira/browse/SPARK-26303 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Priority: Minor > > Currently, JSON datasource and JSON functions return row with all null for a > malformed JSON string in the PERMISSIVE mode when specified schema has the > struct type. All nulls are returned even some of fields were parsed and > converted to desired types successfully. The ticket aims to solve the problem > by returning already parsed fields. The corrupted column specified via JSON > option `columnNameOfCorruptRecord` or SQL config should contain whole > original JSON string. > For example, if the input has one JSON string: > {code:json} > {"a":0.1,"b":{},"c":"def"} > {code} > and specified schema is: > {code:sql} > a DOUBLE, b ARRAY, c STRING, _corrupt_record STRIN > {code} > expected output of `from_json` in the PERMISSIVE mode: > {code} > +---++---+--+ > |a |b |c |_corrupt_record | > +---++---+--+ > |0.1|null|def|{"a":0.1,"b":{},"c":"def"}| > +---++---+--+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26324) Spark submit does not work with messos over ssl
[ https://issues.apache.org/jira/browse/SPARK-26324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Machado updated SPARK-26324: -- Description: Hi guys, I was trying to run the examples on a mesos cluster that uses https. I tried with rest endpoint: {code:java} ./spark-submit --class org.apache.spark.examples.SparkPi --master mesos://:5050 --conf spark.master.rest.enabled=true --deploy-mode cluster --supervise --executor-memory 10G --total-executor-cores 100 ../examples/jars/spark-examples_2.11-2.4.0.jar 1000 {code} The error that I get on the host where I started the spark-submit is: {code:java} 2018-12-10 15:08:39 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2018-12-10 15:08:39 INFO RestSubmissionClient:54 - Submitting a request to launch an application in mesos://:5050. 2018-12-10 15:08:39 WARN RestSubmissionClient:66 - Unable to connect to server mesos://:5050. Exception in thread "main" org.apache.spark.deploy.rest.SubmitRestConnectionException: Unable to connect to server at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$createSubmission$3.apply(RestSubmissionClient.scala:104) at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$createSubmission$3.apply(RestSubmissionClient.scala:86) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) at org.apache.spark.deploy.rest.RestSubmissionClient.createSubmission(RestSubmissionClient.scala:86) at org.apache.spark.deploy.rest.RestSubmissionClientApp.run(RestSubmissionClient.scala:443) at org.apache.spark.deploy.rest.RestSubmissionClientApp.start(RestSubmissionClient.scala:455) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.spark.deploy.rest.SubmitRestConnectionException: Unable to connect to server at org.apache.spark.deploy.rest.RestSubmissionClient.readResponse(RestSubmissionClient.scala:281) at org.apache.spark.deploy.rest.RestSubmissionClient.org$apache$spark$deploy$rest$RestSubmissionClient$$postJson(RestSubmissionClient.scala:225) at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$createSubmission$3.apply(RestSubmissionClient.scala:90) ... 15 more Caused by: java.net.SocketException: Connection reset {code} I'm pretty sure this is because of the hardcoded http:// here: {code:java} RestSubmissionClient.scala /** Return the base URL for communicating with the server, including the protocol version. */ private def getBaseUrl(master: String): String = { var masterUrl = master supportedMasterPrefixes.foreach { prefix => if (master.startsWith(prefix)) { masterUrl = master.stripPrefix(prefix) } } masterUrl = masterUrl.stripSuffix("/") s"http://$masterUrl/$PROTOCOL_VERSION/submissions; <--- hardcoded http } {code} Then I tried without the _--deploy-mode cluster_ and I get: {code:java} ./spark-submit --class org.apache.spark.examples.SparkPi --master mesos://:5050 --supervise --executor-memory 10G --total-executor-cores 100 ../examples/jars/spark-examples_2.11-2.4.0.jar 1000 {code} On the spark console I get: {code:java} 2018-12-10 15:01:05 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://_host:4040 2018-12-10 15:01:05 INFO SparkContext:54 - Added JAR file:/home//spark-2.4.0-bin-hadoop2.7/bin/../examples/jars/spark-examples_2.11-2.4.0.jar at spark://_host:35719/jars/spark-examples_2.11-2.4.0.jar with timestamp 1544450465799 I1210 15:01:05.963078 37943 sched.cpp:232] Version: 1.3.2 I1210 15:01:05.966814 37911 sched.cpp:336] New master detected at master@53.54.195.251:5050 I1210 15:01:05.967010 37911 sched.cpp:352] No credentials provided. Attempting to register without authentication E1210 15:01:05.967347 37942 process.cpp:2455] Failed to shutdown socket with fd 307, address 53.54.195.251:45206: Transport endpoint is not connected E1210 15:01:05.968212 37942 process.cpp:2369] Failed to shutdown socket with fd 307, address 53.54.195.251:45212: Transport endpoint is not connected E1210 15:01:05.969405 37942 process.cpp:2455] Failed to shutdown socket with fd 307, address 53.54.195.251:45222: