[jira] [Commented] (SPARK-38189) Add priority scheduling doc for Spark on K8S
[ https://issues.apache.org/jira/browse/SPARK-38189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501867#comment-17501867 ] Yikun Jiang commented on SPARK-38189: - [~dongjoon] Thanks for information, I re-greate a JIRA: SPARK-38423 for "Support priority scheduling with volcano implementations.” > Add priority scheduling doc for Spark on K8S > > > Key: SPARK-38189 > URL: https://issues.apache.org/jira/browse/SPARK-38189 > Project: Spark > Issue Type: Task > Components: Documentation, Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.3.0, 3.2.2 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38423) Support priority scheduling with volcano implementations
[ https://issues.apache.org/jira/browse/SPARK-38423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501866#comment-17501866 ] Apache Spark commented on SPARK-38423: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/35639 > Support priority scheduling with volcano implementations > > > Key: SPARK-38423 > URL: https://issues.apache.org/jira/browse/SPARK-38423 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38423) Support priority scheduling with volcano implementations
[ https://issues.apache.org/jira/browse/SPARK-38423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38423: Assignee: Apache Spark > Support priority scheduling with volcano implementations > > > Key: SPARK-38423 > URL: https://issues.apache.org/jira/browse/SPARK-38423 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38423) Support priority scheduling with volcano implementations
[ https://issues.apache.org/jira/browse/SPARK-38423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501865#comment-17501865 ] Apache Spark commented on SPARK-38423: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/35639 > Support priority scheduling with volcano implementations > > > Key: SPARK-38423 > URL: https://issues.apache.org/jira/browse/SPARK-38423 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38423) Support priority scheduling with volcano implementations
[ https://issues.apache.org/jira/browse/SPARK-38423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38423: Assignee: (was: Apache Spark) > Support priority scheduling with volcano implementations > > > Key: SPARK-38423 > URL: https://issues.apache.org/jira/browse/SPARK-38423 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38423) Support priority scheduling with volcano implementations
Yikun Jiang created SPARK-38423: --- Summary: Support priority scheduling with volcano implementations Key: SPARK-38423 URL: https://issues.apache.org/jira/browse/SPARK-38423 Project: Spark Issue Type: Sub-task Components: Kubernetes Affects Versions: 3.3.0 Reporter: Yikun Jiang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38135) Introduce `spark.kubernetes.job` sheduling related configurations
[ https://issues.apache.org/jira/browse/SPARK-38135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang resolved SPARK-38135. - Resolution: Invalid > Introduce `spark.kubernetes.job` sheduling related configurations > -- > > Key: SPARK-38135 > URL: https://issues.apache.org/jira/browse/SPARK-38135 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Priority: Major > > spark.kubernetes.job.minCPU: the minimum cpu resources for running job > spark.kubernetes.job.minMemory: the minimum memory resources for running job > spark.kubernetes.job.minMember: the minimum number of pods for running job > spark.kubernetes.job.priorityClassName: the priority of the running job > spark.kubernetes.job.queue: the queue to which the running job belongs -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38258) [proposal] collect & update statistics automatically when spark SQL is running
[ https://issues.apache.org/jira/browse/SPARK-38258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gabrywu updated SPARK-38258: Description: As we all know, table & column statistics are very important to spark SQL optimizer, however we have to collect & update them using {code:java} analyze table tableName compute statistics{code} It's a little inconvenient, so why can't we {color:#ff}collect & update statistics automatically{color} when a spark stage runs and finishes? For example, when a insert overwrite table statement finishes, we can update a corresponding table statistics using SQL metrics. And in following queries, spark sql optimizer can use these statistics. As we all know, it's a common case that we run daily batches using Spark SQLs, so a same SQL can run every day, and the SQL and its corresponding tables data change slowly. That means we can use statistics updated on yesterday to optimize current SQLs, of course can also adjust the important configs, such as spark.sql.shuffle.partitions So we'd better add a mechanism to store every stage's statistics somewhere, and use it in new SQLs. Not just collect statistics after a stage finishes. Of course, we'd better {color:#ff}add a version number to statistics{color} in case of losing efficacy https://docs.google.com/document/d/1L48Dovynboi_ARu-OqQNJCOQqeVUTutLu8fo-w_ZPPA/edit# was: As we all know, table & column statistics are very important to spark SQL optimizer, however we have to collect & update them using {code:java} analyze table tableName compute statistics{code} It's a little inconvenient, so why can't we {color:#ff}collect & update statistics automatically{color} when a spark stage runs and finishes? For example, when a insert overwrite table statement finishes, we can update a corresponding table statistics using SQL metrics. And in following queries, spark sql optimizer can use these statistics. As we all know, it's a common case that we run daily batches using Spark SQLs, so a same SQL can run every day, and the SQL and its corresponding tables data change slowly. That means we can use statistics updated on yesterday to optimize current SQLs, of course can also adjust the important configs, such as spark.sql.shuffle.partitions So we'd better add a mechanism to store every stage's statistics somewhere, and use it in new SQLs. Not just collect statistics after a stage finishes. Of course, we'd better {color:#ff}add a version number to statistics{color} in case of losing efficacy > [proposal] collect & update statistics automatically when spark SQL is running > -- > > Key: SPARK-38258 > URL: https://issues.apache.org/jira/browse/SPARK-38258 > Project: Spark > Issue Type: Wish > Components: Spark Core, SQL >Affects Versions: 2.4.0, 3.0.0, 3.1.0, 3.2.0 >Reporter: gabrywu >Priority: Minor > > As we all know, table & column statistics are very important to spark SQL > optimizer, however we have to collect & update them using > {code:java} > analyze table tableName compute statistics{code} > It's a little inconvenient, so why can't we {color:#ff}collect & update > statistics automatically{color} when a spark stage runs and finishes? > For example, when a insert overwrite table statement finishes, we can update > a corresponding table statistics using SQL metrics. And in following queries, > spark sql optimizer can use these statistics. > As we all know, it's a common case that we run daily batches using Spark > SQLs, so a same SQL can run every day, and the SQL and its corresponding > tables data change slowly. That means we can use statistics updated on > yesterday to optimize current SQLs, of course can also adjust the important > configs, such as spark.sql.shuffle.partitions > So we'd better add a mechanism to store every stage's statistics somewhere, > and use it in new SQLs. Not just collect statistics after a stage finishes. > Of course, we'd better {color:#ff}add a version number to > statistics{color} in case of losing efficacy > > https://docs.google.com/document/d/1L48Dovynboi_ARu-OqQNJCOQqeVUTutLu8fo-w_ZPPA/edit# -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38258) [proposal] collect & update statistics automatically when spark SQL is running
[ https://issues.apache.org/jira/browse/SPARK-38258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501864#comment-17501864 ] gabrywu commented on SPARK-38258: - [~yumwang] what do you think of it? > [proposal] collect & update statistics automatically when spark SQL is running > -- > > Key: SPARK-38258 > URL: https://issues.apache.org/jira/browse/SPARK-38258 > Project: Spark > Issue Type: Wish > Components: Spark Core, SQL >Affects Versions: 2.4.0, 3.0.0, 3.1.0, 3.2.0 >Reporter: gabrywu >Priority: Minor > > As we all know, table & column statistics are very important to spark SQL > optimizer, however we have to collect & update them using > {code:java} > analyze table tableName compute statistics{code} > It's a little inconvenient, so why can't we {color:#ff}collect & update > statistics automatically{color} when a spark stage runs and finishes? > For example, when a insert overwrite table statement finishes, we can update > a corresponding table statistics using SQL metrics. And in following queries, > spark sql optimizer can use these statistics. > As we all know, it's a common case that we run daily batches using Spark > SQLs, so a same SQL can run every day, and the SQL and its corresponding > tables data change slowly. That means we can use statistics updated on > yesterday to optimize current SQLs, of course can also adjust the important > configs, such as spark.sql.shuffle.partitions > So we'd better add a mechanism to store every stage's statistics somewhere, > and use it in new SQLs. Not just collect statistics after a stage finishes. > Of course, we'd better {color:#ff}add a version number to > statistics{color} in case of losing efficacy -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38258) [proposal] collect & update statistics automatically when spark SQL is running
[ https://issues.apache.org/jira/browse/SPARK-38258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gabrywu updated SPARK-38258: Description: As we all know, table & column statistics are very important to spark SQL optimizer, however we have to collect & update them using {code:java} analyze table tableName compute statistics{code} It's a little inconvenient, so why can't we {color:#ff}collect & update statistics automatically{color} when a spark stage runs and finishes? For example, when a insert overwrite table statement finishes, we can update a corresponding table statistics using SQL metrics. And in following queries, spark sql optimizer can use these statistics. As we all know, it's a common case that we run daily batches using Spark SQLs, so a same SQL can run every day, and the SQL and its corresponding tables data change slowly. That means we can use statistics updated on yesterday to optimize current SQLs, of course can also adjust the important configs, such as spark.sql.shuffle.partitions So we'd better add a mechanism to store every stage's statistics somewhere, and use it in new SQLs. Not just collect statistics after a stage finishes. Of course, we'd better {color:#ff}add a version number to statistics{color} in case of losing efficacy was: As we all know, table & column statistics are very important to spark SQL optimizer, however we have to collect & update them using {code:java} analyze table tableName compute statistics{code} It's a little inconvenient, so why can't we {color:#ff}collect & update statistics automatically{color} when a spark stage runs and finishes? For example, when a insert overwrite table statement finishes, we can update a corresponding table statistics using SQL metrics. And in following queries, spark sql optimizer can use these statistics. As we all know, it's a common case that we run daily batches using Spark SQLs, so a same SQL can run every day, and the SQL and its corresponding tables data change slowly. That means we can use statistics updated on yesterday to optimize current SQLs. So we'd better add a mechanism to store every stage's statistics somewhere, and use it in new SQLs. Not just collect statistics after a stage finishes. Of course, we'd better {color:#ff}add a version number to statistics{color} in case of losing efficacy > [proposal] collect & update statistics automatically when spark SQL is running > -- > > Key: SPARK-38258 > URL: https://issues.apache.org/jira/browse/SPARK-38258 > Project: Spark > Issue Type: Wish > Components: Spark Core, SQL >Affects Versions: 2.4.0, 3.0.0, 3.1.0, 3.2.0 >Reporter: gabrywu >Priority: Minor > > As we all know, table & column statistics are very important to spark SQL > optimizer, however we have to collect & update them using > {code:java} > analyze table tableName compute statistics{code} > It's a little inconvenient, so why can't we {color:#ff}collect & update > statistics automatically{color} when a spark stage runs and finishes? > For example, when a insert overwrite table statement finishes, we can update > a corresponding table statistics using SQL metrics. And in following queries, > spark sql optimizer can use these statistics. > As we all know, it's a common case that we run daily batches using Spark > SQLs, so a same SQL can run every day, and the SQL and its corresponding > tables data change slowly. That means we can use statistics updated on > yesterday to optimize current SQLs, of course can also adjust the important > configs, such as spark.sql.shuffle.partitions > So we'd better add a mechanism to store every stage's statistics somewhere, > and use it in new SQLs. Not just collect statistics after a stage finishes. > Of course, we'd better {color:#ff}add a version number to > statistics{color} in case of losing efficacy -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37426) Inline type hints for python/pyspark/mllib/regression.py
[ https://issues.apache.org/jira/browse/SPARK-37426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz reassigned SPARK-37426: -- Assignee: Maciej Szymkiewicz > Inline type hints for python/pyspark/mllib/regression.py > > > Key: SPARK-37426 > URL: https://issues.apache.org/jira/browse/SPARK-37426 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Major > > Inline type hints from python/pyspark/mlib/regression.pyi to > python/pyspark/mllib/regression.py -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37426) Inline type hints for python/pyspark/mllib/regression.py
[ https://issues.apache.org/jira/browse/SPARK-37426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz resolved SPARK-37426. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35585 [https://github.com/apache/spark/pull/35585] > Inline type hints for python/pyspark/mllib/regression.py > > > Key: SPARK-37426 > URL: https://issues.apache.org/jira/browse/SPARK-37426 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Major > Fix For: 3.3.0 > > > Inline type hints from python/pyspark/mlib/regression.pyi to > python/pyspark/mllib/regression.py -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37400) Inline type hints for python/pyspark/mllib/classification.py
[ https://issues.apache.org/jira/browse/SPARK-37400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz resolved SPARK-37400. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35585 [https://github.com/apache/spark/pull/35585] > Inline type hints for python/pyspark/mllib/classification.py > > > Key: SPARK-37400 > URL: https://issues.apache.org/jira/browse/SPARK-37400 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Major > Fix For: 3.3.0 > > > Inline type hints from python/pyspark/mlib/classification.pyi to > python/pyspark/mllib/classification.py. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37400) Inline type hints for python/pyspark/mllib/classification.py
[ https://issues.apache.org/jira/browse/SPARK-37400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz reassigned SPARK-37400: -- Assignee: Maciej Szymkiewicz > Inline type hints for python/pyspark/mllib/classification.py > > > Key: SPARK-37400 > URL: https://issues.apache.org/jira/browse/SPARK-37400 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Major > > Inline type hints from python/pyspark/mlib/classification.pyi to > python/pyspark/mllib/classification.py. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37430) Inline type hints for python/pyspark/mllib/linalg/distributed.py
[ https://issues.apache.org/jira/browse/SPARK-37430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37430: Assignee: (was: Apache Spark) > Inline type hints for python/pyspark/mllib/linalg/distributed.py > > > Key: SPARK-37430 > URL: https://issues.apache.org/jira/browse/SPARK-37430 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > Inline type hints from python/pyspark/mlib/linalg/distributed.pyi to > python/pyspark/mllib/linalg/distributed.py -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37430) Inline type hints for python/pyspark/mllib/linalg/distributed.py
[ https://issues.apache.org/jira/browse/SPARK-37430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37430: Assignee: Apache Spark > Inline type hints for python/pyspark/mllib/linalg/distributed.py > > > Key: SPARK-37430 > URL: https://issues.apache.org/jira/browse/SPARK-37430 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Apache Spark >Priority: Major > > Inline type hints from python/pyspark/mlib/linalg/distributed.pyi to > python/pyspark/mllib/linalg/distributed.py -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37430) Inline type hints for python/pyspark/mllib/linalg/distributed.py
[ https://issues.apache.org/jira/browse/SPARK-37430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501837#comment-17501837 ] Apache Spark commented on SPARK-37430: -- User 'hi-zir' has created a pull request for this issue: https://github.com/apache/spark/pull/35739 > Inline type hints for python/pyspark/mllib/linalg/distributed.py > > > Key: SPARK-37430 > URL: https://issues.apache.org/jira/browse/SPARK-37430 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > Inline type hints from python/pyspark/mlib/linalg/distributed.pyi to > python/pyspark/mllib/linalg/distributed.py -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38422) Encryption algorithms should be used with secure mode and padding scheme
Bjørn Jørgensen created SPARK-38422: --- Summary: Encryption algorithms should be used with secure mode and padding scheme Key: SPARK-38422 URL: https://issues.apache.org/jira/browse/SPARK-38422 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: Bjørn Jørgensen I have scanned java files with Sonarqube and in https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java {code:java} try { if (mode.equalsIgnoreCase("ECB") && (padding.equalsIgnoreCase("PKCS") || padding.equalsIgnoreCase("DEFAULT"))) { Cipher cipher = Cipher.getInstance("AES/ECB/PKCS5Padding"); {code} Encryption operation mode and the padding scheme should be chosen appropriately to guarantee data confidentiality, integrity and authenticity: For block cipher encryption algorithms (like AES): The GCM (Galois Counter Mode) mode which works internally with zero/no padding scheme, is recommended, as it is designed to provide both data authenticity (integrity) and confidentiality. Other similar modes are CCM, CWC, EAX, IAPM and OCB. The CBC (Cipher Block Chaining) mode by itself provides only data confidentiality, it’s recommended to use it along with Message Authentication Code or similar to achieve data authenticity (integrity) too and thus to prevent padding oracle attacks. The ECB (Electronic Codebook) mode doesn’t provide serious message confidentiality: under a given key any given plaintext block always gets encrypted to the same ciphertext block. This mode should not be used. For RSA encryption algorithm, the recommended padding scheme is OAEP. [OWASP Top 10 2021|https://owasp.org/Top10/A02_2021-Cryptographic_Failures/] Category A2 - Cryptographic Failures [OWASP Top 10 2017|https://owasp.org/www-project-top-ten/2017/A6_2017-Security_Misconfiguration.html] Category A6 - Security Misconfiguration [Mobile AppSec|https://mobile-security.gitbook.io/masvs/security-requirements/0x08-v3-cryptography_verification_requirements] Verification Standard - Cryptography Requirements [OWASP Mobile Top 10 2016|https://owasp.org/www-project-mobile-top-10/2016-risks/m5-insufficient-cryptography] Category M5 - Insufficient Cryptography [MITRE, CWE-327|https://cwe.mitre.org/data/definitions/327.html] - Use of a Broken or Risky Cryptographic Algorithm [CERT, MSC61-J.|https://wiki.sei.cmu.edu/confluence/display/java/MSC61-J.+Do+not+use+insecure+or+weak+cryptographic+algorithms] - Do not use insecure or weak cryptographic algorithms [SANS Top 25|https://www.sans.org/top25-software-errors/#cat3] - Porous Defenses -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38421) Cipher Block Chaining IVs should be unpredictable
Bjørn Jørgensen created SPARK-38421: --- Summary: Cipher Block Chaining IVs should be unpredictable Key: SPARK-38421 URL: https://issues.apache.org/jira/browse/SPARK-38421 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 3.3.0 Reporter: Bjørn Jørgensen I have scanned java files with Sonarqube and in https://github.com/apache/spark/blob/master/common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java {code:java} @VisibleForTesting CryptoOutputStream createOutputStream(WritableByteChannel ch) throws IOException { return new CryptoOutputStream(cipher, conf, ch, key, new IvParameterSpec(outIv)); @VisibleForTesting CryptoInputStream createInputStream(ReadableByteChannel ch) throws IOException { return new CryptoInputStream(cipher, conf, ch, key, new IvParameterSpec(inIv)); {code} When encrypting data with the Cipher Block Chaining (CBC) mode an Initialization Vector (IV) is used to randomize the encryption, ie under a given key the same plaintext doesn’t always produce the same ciphertext. The IV doesn’t need to be secret but should be unpredictable to avoid "Chosen-Plaintext Attack". To generate Initialization Vectors, NIST recommends to use a secure random number generator. [OWASP Top 10 2021|https://owasp.org/Top10/A02_2021-Cryptographic_Failures/] Category A2 - Cryptographic Failures [OWASP Top 10|https://owasp.org/www-project-top-ten/2017/A6_2017-Security_Misconfiguration.html] 2017 Category A6 - Security Misconfiguration [MITRE, CWE-329|https://cwe.mitre.org/data/definitions/329.html] - CWE-329: Not Using an Unpredictable IV with CBC Mode [MITRE, CWE-330|https://cwe.mitre.org/data/definitions/330.html] - Use of Insufficiently Random Values [NIST, SP-800-38A|https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf] - Recommendation for Block Cipher Modes of Operation Derived from FindSecBugs [rule STATIC_IV|https://find-sec-bugs.github.io/bugs.htm#STATIC_IV] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38393) Clean up deprecated usage of GenSeq/GenMap
[ https://issues.apache.org/jira/browse/SPARK-38393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-38393. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35713 [https://github.com/apache/spark/pull/35713] > Clean up deprecated usage of GenSeq/GenMap > -- > > Key: SPARK-38393 > URL: https://issues.apache.org/jira/browse/SPARK-38393 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.3.0 > > > GenSeq/GenMap is identified as @deprecated since Scala 2.13.0 and Gen* > collection types have been removed. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38393) Clean up deprecated usage of GenSeq/GenMap
[ https://issues.apache.org/jira/browse/SPARK-38393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-38393: Assignee: Yang Jie > Clean up deprecated usage of GenSeq/GenMap > -- > > Key: SPARK-38393 > URL: https://issues.apache.org/jira/browse/SPARK-38393 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > GenSeq/GenMap is identified as @deprecated since Scala 2.13.0 and Gen* > collection types have been removed. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38420) Upgrade bcprov-jdk15on from 1.60 to 1.67
Bjørn Jørgensen created SPARK-38420: --- Summary: Upgrade bcprov-jdk15on from 1.60 to 1.67 Key: SPARK-38420 URL: https://issues.apache.org/jira/browse/SPARK-38420 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.3.0 Reporter: Bjørn Jørgensen Upgrade bcprov-jdk15on from 1.60 to 1.67 [CVE-2020-15522|https://nvd.nist.gov/vuln/detail/CVE-2020-15522] [releasenotes.|https://github.com/bcgit/bc-java/blob/master/docs/releasenotes.html] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org