[jira] [Updated] (SPARK-27322) DataSourceV2: Select from multiple catalogs
[ https://issues.apache.org/jira/browse/SPARK-27322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Zhuge updated SPARK-27322: --- Summary: DataSourceV2: Select from multiple catalogs (was: DataSourceV2: Logical relation in multiple catalogs) > DataSourceV2: Select from multiple catalogs > --- > > Key: SPARK-27322 > URL: https://issues.apache.org/jira/browse/SPARK-27322 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: John Zhuge >Priority: Major > > Support multi-catalog in the following SELECT code paths: > * SELECT * FROM catalog.db.tbl > * TABLE catalog.db.tbl > * JOIN or UNION tables from different catalogs > * SparkSession.table("catalog.db.tbl") -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27147) Create new unit test cases for SortShuffleWriter
[ https://issues.apache.org/jira/browse/SPARK-27147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27147: -- Component/s: Tests > Create new unit test cases for SortShuffleWriter > > > Key: SPARK-27147 > URL: https://issues.apache.org/jira/browse/SPARK-27147 > Project: Spark > Issue Type: Test > Components: Spark Core, Tests >Affects Versions: 3.0.0 >Reporter: wangjiaochun >Assignee: wangjiaochun >Priority: Minor > Fix For: 3.0.0 > > > Create new unit test cases for SortShuffleWriter -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27147) Create new unit test cases for SortShuffleWriter
[ https://issues.apache.org/jira/browse/SPARK-27147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27147: -- Issue Type: Improvement (was: Test) > Create new unit test cases for SortShuffleWriter > > > Key: SPARK-27147 > URL: https://issues.apache.org/jira/browse/SPARK-27147 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Tests >Affects Versions: 3.0.0 >Reporter: wangjiaochun >Assignee: wangjiaochun >Priority: Minor > Fix For: 3.0.0 > > > Create new unit test cases for SortShuffleWriter -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27147) Create new unit test cases for SortShuffleWriter
[ https://issues.apache.org/jira/browse/SPARK-27147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-27147. --- Resolution: Fixed Assignee: wangjiaochun This is resolved via https://github.com/apache/spark/pull/24080 > Create new unit test cases for SortShuffleWriter > > > Key: SPARK-27147 > URL: https://issues.apache.org/jira/browse/SPARK-27147 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: wangjiaochun >Assignee: wangjiaochun >Priority: Minor > Fix For: 3.0.0 > > > Create new unit test cases for SortShuffleWriter -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27841) Improve UTF8String fromString()/toString()/numChars() performance when strings are ASCII
[ https://issues.apache.org/jira/browse/SPARK-27841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27841: Assignee: Josh Rosen (was: Apache Spark) > Improve UTF8String fromString()/toString()/numChars() performance when > strings are ASCII > > > Key: SPARK-27841 > URL: https://issues.apache.org/jira/browse/SPARK-27841 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Major > > UTF8String's fromString(), toString(), and numChars() methods are performance > hotspots. For strings which consist entirely of ASCII characters we can make > performance optimizations which significantly reduce memory allocation and > copying, greatly improving performance for many common workloads. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27841) Improve UTF8String fromString()/toString()/numChars() performance when strings are ASCII
[ https://issues.apache.org/jira/browse/SPARK-27841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27841: Assignee: Apache Spark (was: Josh Rosen) > Improve UTF8String fromString()/toString()/numChars() performance when > strings are ASCII > > > Key: SPARK-27841 > URL: https://issues.apache.org/jira/browse/SPARK-27841 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Josh Rosen >Assignee: Apache Spark >Priority: Major > > UTF8String's fromString(), toString(), and numChars() methods are performance > hotspots. For strings which consist entirely of ASCII characters we can make > performance optimizations which significantly reduce memory allocation and > copying, greatly improving performance for many common workloads. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27711) InputFileBlockHolder should be unset at the end of tasks
[ https://issues.apache.org/jira/browse/SPARK-27711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-27711. --- Resolution: Fixed Assignee: Jose Torres Fix Version/s: 3.0.0 2.4.4 This is resolved via - https://github.com/apache/spark/pull/24605 (master) - https://github.com/apache/spark/pull/24690 (branch-2.4) > InputFileBlockHolder should be unset at the end of tasks > > > Key: SPARK-27711 > URL: https://issues.apache.org/jira/browse/SPARK-27711 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 2.4.3 >Reporter: Jose Torres >Assignee: Jose Torres >Priority: Major > Fix For: 2.4.4, 3.0.0 > > > InputFileBlockHolder should be unset at the end of each task. Otherwise the > value of input_file_name() can leak over to other tasks instead of beginning > as empty string. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27841) Improve UTF8String fromString()/toString()/numChars() performance when strings are ASCII
Josh Rosen created SPARK-27841: -- Summary: Improve UTF8String fromString()/toString()/numChars() performance when strings are ASCII Key: SPARK-27841 URL: https://issues.apache.org/jira/browse/SPARK-27841 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Josh Rosen Assignee: Josh Rosen UTF8String's fromString(), toString(), and numChars() methods are performance hotspots. For strings which consist entirely of ASCII characters we can make performance optimizations which significantly reduce memory allocation and copying, greatly improving performance for many common workloads. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27801) InMemoryFileIndex.listLeafFiles should use listLocatedStatus for DistributedFileSystem
[ https://issues.apache.org/jira/browse/SPARK-27801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-27801. --- Resolution: Fixed Assignee: Rob Russo Fix Version/s: 3.0.0 This is resolved via https://github.com/apache/spark/pull/24672 > InMemoryFileIndex.listLeafFiles should use listLocatedStatus for > DistributedFileSystem > -- > > Key: SPARK-27801 > URL: https://issues.apache.org/jira/browse/SPARK-27801 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.3 >Reporter: Rob Russo >Assignee: Rob Russo >Priority: Major > Fix For: 3.0.0 > > > Currently in InMemoryFileIndex, all directory listings are done using > FileSystem.listStatus following by individual calls to > FileSystem.getFileBlockLocations. This is painstakingly slow for folders that > have large numbers of files because this process happens serially and > parallelism is only applied at the folder level, not the file level. > FileSystem also provides another API listLocatedStatus which returns the > LocatedFileStatus objects that already have the block locations. In > FileSystem main class this just delegates to listStatus and > getFileBlockLocations similarly to the way Spark does it. However when HDFS > specifically is the backing file system, DistributedFileSystem overrides this > method and simply makes one single call to the namenode to retrieve the > directory listing with the block locations. This avoids potentially thousands > or more calls to namenode and also is more consistent because files will > either exist with locations or not exist instead of having the > FileNotFoundException exception case. > For our example directory with 6500 files, the load time of > spark.read.parquet was reduced 96x from 76 seconds to .8 seconds. This > savings only goes up with the number of files in the directory. > In the pull request instead of using this method always which could lead to a > FileNotFoundException that could be tough to decipher in the default > FileSystem implementation, this method is only used when the FileSystem is a > DistributedFileSystem and otherwise the old logic still applies. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27840) Hadoop attempts to create a temporary folder in root folder
[ https://issues.apache.org/jira/browse/SPARK-27840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] M. Le Bihan resolved SPARK-27840. - Resolution: Not A Bug Was mine own bug... > Hadoop attempts to create a temporary folder in root folder > --- > > Key: SPARK-27840 > URL: https://issues.apache.org/jira/browse/SPARK-27840 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: M. Le Bihan >Priority: Major > > I have a REST web-service that calls a Spring-boot service. > > {code:java} > /** > * Exporter les comptes de résultats et activités par niveau NAF, pour une > série d'intercommunalités, dans un fichier CSV. > * @param anneeCOG Année du COG. > * @param anneeSIRENE Année des données SIRENE à prendre en considération, > pour l'extraction des données entreprise/établissement. > * @param anneeComptesResultats Année des données Comptes de résultats à > prendre en considération. > * @param niveauNAF Niveau NAF de regroupement. > * @param codesIntercommunalites Code EPCI / SIREN des intercommunalités. > * @return Fichier d'exportation des comptes de résultats et activités > majeures des communes par niveau NAF. > * @throws IntercommunaliteAbsenteDansCommuneSiegeException si > l'intercommunalité désirée n'existe pas. > * @throws TechniqueException si un incident survient. > * @throws IOException > */ > @RequestMapping("/activites/communes/exporterActivitesIntercommunalites") > @Produces("text/csv") > public String exporterCSV(@RequestParam(name="anneeCOG") int anneeCOG, > @RequestParam(name="anneeSIRENE") int anneeSIRENE, > @RequestParam(name="anneeComptesResultats") int > anneeComptesResultats, @RequestParam(name="niveauNAF") int niveauNAF, > @RequestParam(name="codesIntercommunalites") String[] > codesIntercommunalites) throws > IntercommunaliteAbsenteDansCommuneSiegeException, TechniqueException, > IOException { > SIRENCommune[] sirenIntercommunalites = new > SIRENCommune[codesIntercommunalites.length]; > > for(int index=0; index < codesIntercommunalites.length; index ++) { > sirenIntercommunalites[index] = new > SIRENCommune(codesIntercommunalites[index]); > } > > File tempCSV = new > File(this.environnement.getProperty("java.io.tmpdir") + > MessageFormat.format("{0,number,#0}", System.currentTimeMillis())); > File sortieCSV = > this.impactActivitesCommunalesService.exporterCSV(tempCSV, anneeCOG, > anneeSIRENE, anneeComptesResultats, niveauNAF, sirenIntercommunalites); > > StringBuilder contenuCSV = new StringBuilder(); > > try(Stream stream = Files.lines(sortieCSV.toPath(), > StandardCharsets.UTF_8)) { > stream.forEach(s -> contenuCSV.append(s).append("\n")); > } > return contenuCSV.toString(); > }{code} > > The Spring service create a Dataset, and then a CSV file from it, and return > that CSV to the rest web-service (it will have only 40 - 50 lines). > > {code:java} > /** > * Exporter les comptes de résultats et activités par niveau NAF, pour une > série d'intercommunalités, dans un fichier CSV. > * @param anneeCOG Année du COG. > * @param anneeSIRENE Année des données SIRENE à prendre en considération, > pour l'extraction des données entreprise/établissement. > * @param anneeComptesResultats Année des données Comptes de résultats à > prendre en considération. > * @param niveauNAF Niveau NAF de regroupement. > * @param codesIntercommunalites Code EPCI / SIREN des intercommunalités. > * @return Fichier d'exportation des comptes de résultats et activités > majeures des communes par niveau NAF. > * @throws IntercommunaliteAbsenteDansCommuneSiegeException si > l'intercommunalité désirée n'existe pas. > * @throws TechniqueException si un incident survient. > */ > public File exporterCSV(File sortieCSV, int anneeCOG, int anneeSIRENE, int > anneeComptesResultats, int niveauNAF, SIRENCommune... codesIntercommunalites) > throws IntercommunaliteAbsenteDansCommuneSiegeException, TechniqueException { > Objects.requireNonNull(sortieCSV, "Le fichier CSV de sortie ne peut pas > valoir null."); > > JavaPairRDD ActivitesCommunaleParNAF>> intercos = > rddActivitesEtComptesResultatsCommunes(anneeCOG, anneeSIRENE, > anneeComptesResultats, niveauNAF, codesIntercommunalites); > Dataset ds = toDataset(anneeCOG, intercos); > ds.coalesce(1).write().mode(SaveMode.Overwrite).option("header", > "true").option("quoteMode", "NON_NUMERIC").option("quote", > "\"").csv(sortieCSV.getAbsolutePath()); > > // Dresser la liste des fichiers d'extension .csv produits. > try { > List
[jira] [Created] (SPARK-27840) Hadoop attempts to create a temporary folder in root folder
M. Le Bihan created SPARK-27840: --- Summary: Hadoop attempts to create a temporary folder in root folder Key: SPARK-27840 URL: https://issues.apache.org/jira/browse/SPARK-27840 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.3 Reporter: M. Le Bihan I have a REST web-service that calls a Spring-boot service. {code:java} /** * Exporter les comptes de résultats et activités par niveau NAF, pour une série d'intercommunalités, dans un fichier CSV. * @param anneeCOG Année du COG. * @param anneeSIRENE Année des données SIRENE à prendre en considération, pour l'extraction des données entreprise/établissement. * @param anneeComptesResultats Année des données Comptes de résultats à prendre en considération. * @param niveauNAF Niveau NAF de regroupement. * @param codesIntercommunalites Code EPCI / SIREN des intercommunalités. * @return Fichier d'exportation des comptes de résultats et activités majeures des communes par niveau NAF. * @throws IntercommunaliteAbsenteDansCommuneSiegeException si l'intercommunalité désirée n'existe pas. * @throws TechniqueException si un incident survient. * @throws IOException */ @RequestMapping("/activites/communes/exporterActivitesIntercommunalites") @Produces("text/csv") public String exporterCSV(@RequestParam(name="anneeCOG") int anneeCOG, @RequestParam(name="anneeSIRENE") int anneeSIRENE, @RequestParam(name="anneeComptesResultats") int anneeComptesResultats, @RequestParam(name="niveauNAF") int niveauNAF, @RequestParam(name="codesIntercommunalites") String[] codesIntercommunalites) throws IntercommunaliteAbsenteDansCommuneSiegeException, TechniqueException, IOException { SIRENCommune[] sirenIntercommunalites = new SIRENCommune[codesIntercommunalites.length]; for(int index=0; index < codesIntercommunalites.length; index ++) { sirenIntercommunalites[index] = new SIRENCommune(codesIntercommunalites[index]); } File tempCSV = new File(this.environnement.getProperty("java.io.tmpdir") + MessageFormat.format("{0,number,#0}", System.currentTimeMillis())); File sortieCSV = this.impactActivitesCommunalesService.exporterCSV(tempCSV, anneeCOG, anneeSIRENE, anneeComptesResultats, niveauNAF, sirenIntercommunalites); StringBuilder contenuCSV = new StringBuilder(); try(Stream stream = Files.lines(sortieCSV.toPath(), StandardCharsets.UTF_8)) { stream.forEach(s -> contenuCSV.append(s).append("\n")); } return contenuCSV.toString(); }{code} The Spring service create a Dataset, and then a CSV file from it, and return that CSV to the rest web-service (it will have only 40 - 50 lines). {code:java} /** * Exporter les comptes de résultats et activités par niveau NAF, pour une série d'intercommunalités, dans un fichier CSV. * @param anneeCOG Année du COG. * @param anneeSIRENE Année des données SIRENE à prendre en considération, pour l'extraction des données entreprise/établissement. * @param anneeComptesResultats Année des données Comptes de résultats à prendre en considération. * @param niveauNAF Niveau NAF de regroupement. * @param codesIntercommunalites Code EPCI / SIREN des intercommunalités. * @return Fichier d'exportation des comptes de résultats et activités majeures des communes par niveau NAF. * @throws IntercommunaliteAbsenteDansCommuneSiegeException si l'intercommunalité désirée n'existe pas. * @throws TechniqueException si un incident survient. */ public File exporterCSV(File sortieCSV, int anneeCOG, int anneeSIRENE, int anneeComptesResultats, int niveauNAF, SIRENCommune... codesIntercommunalites) throws IntercommunaliteAbsenteDansCommuneSiegeException, TechniqueException { Objects.requireNonNull(sortieCSV, "Le fichier CSV de sortie ne peut pas valoir null."); JavaPairRDD> intercos = rddActivitesEtComptesResultatsCommunes(anneeCOG, anneeSIRENE, anneeComptesResultats, niveauNAF, codesIntercommunalites); Dataset ds = toDataset(anneeCOG, intercos); ds.coalesce(1).write().mode(SaveMode.Overwrite).option("header", "true").option("quoteMode", "NON_NUMERIC").option("quote", "\"").csv(sortieCSV.getAbsolutePath()); // Dresser la liste des fichiers d'extension .csv produits. try { List fichiersCSV = Files.walk(sortieCSV.toPath()) // Rechercher dans le répertoire de sortie .map(c -> c.toFile()) // les Path convertis en File, .filter(c -> c.isDirectory() == false && c.getName().endsWith(".csv")) // qui sont des fichiers CSV .collect(Collectors.toList()); // et les renvoyer en liste.
[jira] [Commented] (SPARK-27837) Running rand() in SQL with seed of column results in error (rand(col1))
[ https://issues.apache.org/jira/browse/SPARK-27837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848200#comment-16848200 ] Jason Ferrell commented on SPARK-27837: --- Is there any way to get this into an enhancement request? It doesn't seem quite functional as part of the SQL code that I would pass a literal to every single row of an operation. The behavior I think would be most advantageous would be to pass an int corresponding to a column within the row so that I can get a random value for each row, seeded by a value on that row. > Running rand() in SQL with seed of column results in error (rand(col1)) > --- > > Key: SPARK-27837 > URL: https://issues.apache.org/jira/browse/SPARK-27837 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Jason Ferrell >Priority: Major > > Running this sql: > with a as > ( > select 123 val1 > union all > select 123 val1 > union all > select 123 val1 > ) > select val1,rand(123),rand(val1) > from a > Results in error: org.apache.spark.sql.AnalysisException: Input argument to > rand must be an integer, long or null literal.; > It doesn't appear to recognize the value of the column as an int. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27837) Running rand() in SQL with seed of column results in error (rand(col1))
[ https://issues.apache.org/jira/browse/SPARK-27837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848125#comment-16848125 ] Liang-Chi Hsieh commented on SPARK-27837: - Please see the analysis exception: Input argument to rand must be an integer, long or null literal. > Running rand() in SQL with seed of column results in error (rand(col1)) > --- > > Key: SPARK-27837 > URL: https://issues.apache.org/jira/browse/SPARK-27837 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Jason Ferrell >Priority: Major > > Running this sql: > with a as > ( > select 123 val1 > union all > select 123 val1 > union all > select 123 val1 > ) > select val1,rand(123),rand(val1) > from a > Results in error: org.apache.spark.sql.AnalysisException: Input argument to > rand must be an integer, long or null literal.; > It doesn't appear to recognize the value of the column as an int. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27836) Issue with seeded rand() function in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-27836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848124#comment-16848124 ] Liang-Chi Hsieh commented on SPARK-27836: - rand function initializes only once with given seed at beginning when running the task. Then it generates the random values. > Issue with seeded rand() function in Spark SQL > -- > > Key: SPARK-27836 > URL: https://issues.apache.org/jira/browse/SPARK-27836 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Jason Ferrell >Priority: Major > > This SQL: > with a as > ( > select 123 val1 > union all > select 123 val1 > union all > select 123 val1 > ) > select val1,rand(123) > from a > Results in: > |val1|rand(123)| > |123|0.502953| > |123|0.52307| > |123|0.853569| > > It should result in three rows all with value 0.502953 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27837) Running rand() in SQL with seed of column results in error (rand(col1))
[ https://issues.apache.org/jira/browse/SPARK-27837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848123#comment-16848123 ] Liang-Chi Hsieh commented on SPARK-27837: - The problem isn't that val1 isn't an int, but it isn't a literal. Seed to rand must be a literal. > Running rand() in SQL with seed of column results in error (rand(col1)) > --- > > Key: SPARK-27837 > URL: https://issues.apache.org/jira/browse/SPARK-27837 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Jason Ferrell >Priority: Major > > Running this sql: > with a as > ( > select 123 val1 > union all > select 123 val1 > union all > select 123 val1 > ) > select val1,rand(123),rand(val1) > from a > Results in error: org.apache.spark.sql.AnalysisException: Input argument to > rand must be an integer, long or null literal.; > It doesn't appear to recognize the value of the column as an int. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24149) Automatic namespaces discovery in HDFS federation
[ https://issues.apache.org/jira/browse/SPARK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848097#comment-16848097 ] Marco Gaido commented on SPARK-24149: - [~Dhruve Ashar] the use case for this change, for instance, is when you have a partitioned table, when the partitions are on different namespaces and there is no viewFS configured. In that case, a user running a query on that table, may or may not get an exception when reading it. Please, notice that a user running a query may be different from the user creating it, so he/she may also not be aware of this situation and understanding what is the problem may be pretty hard. > Automatic namespaces discovery in HDFS federation > - > > Key: SPARK-24149 > URL: https://issues.apache.org/jira/browse/SPARK-24149 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.4.0 >Reporter: Marco Gaido >Assignee: Marco Gaido >Priority: Minor > Fix For: 2.4.0 > > > Hadoop 3 introduced HDFS federation. > Spark fails to write on different namespaces when Hadoop federation is turned > on and the cluster is secure. This happens because Spark looks for the > delegation token only for the defaultFS configured and not for all the > available namespaces. A workaround is the usage of the property > {{spark.yarn.access.hadoopFileSystems}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27839) Improve UTF8String.replace() / StringReplace performance
[ https://issues.apache.org/jira/browse/SPARK-27839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27839: Assignee: Josh Rosen (was: Apache Spark) > Improve UTF8String.replace() / StringReplace performance > > > Key: SPARK-27839 > URL: https://issues.apache.org/jira/browse/SPARK-27839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Major > > The UTF8String.replace() function and StringReplace expression are missing a > few common-case optimizations, such as avoiding copies when the replacement > does not change the string and avoiding redundant copying / decoding of the > search and replacement strings in case they are constants. > I think there's room to significantly improve performance here, especially > for single-character replacements. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27839) Improve UTF8String.replace() / StringReplace performance
[ https://issues.apache.org/jira/browse/SPARK-27839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27839: Assignee: Apache Spark (was: Josh Rosen) > Improve UTF8String.replace() / StringReplace performance > > > Key: SPARK-27839 > URL: https://issues.apache.org/jira/browse/SPARK-27839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Josh Rosen >Assignee: Apache Spark >Priority: Major > > The UTF8String.replace() function and StringReplace expression are missing a > few common-case optimizations, such as avoiding copies when the replacement > does not change the string and avoiding redundant copying / decoding of the > search and replacement strings in case they are constants. > I think there's room to significantly improve performance here, especially > for single-character replacements. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27839) Improve UTF8String.replace() / StringReplace performance
Josh Rosen created SPARK-27839: -- Summary: Improve UTF8String.replace() / StringReplace performance Key: SPARK-27839 URL: https://issues.apache.org/jira/browse/SPARK-27839 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Josh Rosen Assignee: Josh Rosen The UTF8String.replace() function and StringReplace expression are missing a few common-case optimizations, such as avoiding copies when the replacement does not change the string and avoiding redundant copying / decoding of the search and replacement strings in case they are constants. I think there's room to significantly improve performance here, especially for single-character replacements. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org