[jira] [Updated] (SPARK-27322) DataSourceV2: Select from multiple catalogs

2019-05-25 Thread John Zhuge (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated SPARK-27322:
---
Summary: DataSourceV2: Select from multiple catalogs  (was: DataSourceV2: 
Logical relation in multiple catalogs)

> DataSourceV2: Select from multiple catalogs
> ---
>
> Key: SPARK-27322
> URL: https://issues.apache.org/jira/browse/SPARK-27322
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: John Zhuge
>Priority: Major
>
> Support multi-catalog in the following SELECT code paths:
>  * SELECT * FROM catalog.db.tbl
>  * TABLE catalog.db.tbl
>  * JOIN or UNION tables from different catalogs
>  * SparkSession.table("catalog.db.tbl")



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27147) Create new unit test cases for SortShuffleWriter

2019-05-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27147:
--
Component/s: Tests

> Create new unit test cases for SortShuffleWriter
> 
>
> Key: SPARK-27147
> URL: https://issues.apache.org/jira/browse/SPARK-27147
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core, Tests
>Affects Versions: 3.0.0
>Reporter: wangjiaochun
>Assignee: wangjiaochun
>Priority: Minor
> Fix For: 3.0.0
>
>
> Create new unit test cases for SortShuffleWriter



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27147) Create new unit test cases for SortShuffleWriter

2019-05-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27147:
--
Issue Type: Improvement  (was: Test)

> Create new unit test cases for SortShuffleWriter
> 
>
> Key: SPARK-27147
> URL: https://issues.apache.org/jira/browse/SPARK-27147
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Tests
>Affects Versions: 3.0.0
>Reporter: wangjiaochun
>Assignee: wangjiaochun
>Priority: Minor
> Fix For: 3.0.0
>
>
> Create new unit test cases for SortShuffleWriter



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27147) Create new unit test cases for SortShuffleWriter

2019-05-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-27147.
---
Resolution: Fixed
  Assignee: wangjiaochun

This is resolved via https://github.com/apache/spark/pull/24080

> Create new unit test cases for SortShuffleWriter
> 
>
> Key: SPARK-27147
> URL: https://issues.apache.org/jira/browse/SPARK-27147
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: wangjiaochun
>Assignee: wangjiaochun
>Priority: Minor
> Fix For: 3.0.0
>
>
> Create new unit test cases for SortShuffleWriter



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27841) Improve UTF8String fromString()/toString()/numChars() performance when strings are ASCII

2019-05-25 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27841:


Assignee: Josh Rosen  (was: Apache Spark)

> Improve UTF8String fromString()/toString()/numChars() performance when 
> strings are ASCII
> 
>
> Key: SPARK-27841
> URL: https://issues.apache.org/jira/browse/SPARK-27841
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
>
> UTF8String's fromString(), toString(), and numChars() methods are performance 
> hotspots. For strings which consist entirely of ASCII characters we can make 
> performance optimizations which significantly reduce memory allocation and 
> copying, greatly improving performance for many common workloads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27841) Improve UTF8String fromString()/toString()/numChars() performance when strings are ASCII

2019-05-25 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27841:


Assignee: Apache Spark  (was: Josh Rosen)

> Improve UTF8String fromString()/toString()/numChars() performance when 
> strings are ASCII
> 
>
> Key: SPARK-27841
> URL: https://issues.apache.org/jira/browse/SPARK-27841
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Josh Rosen
>Assignee: Apache Spark
>Priority: Major
>
> UTF8String's fromString(), toString(), and numChars() methods are performance 
> hotspots. For strings which consist entirely of ASCII characters we can make 
> performance optimizations which significantly reduce memory allocation and 
> copying, greatly improving performance for many common workloads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27711) InputFileBlockHolder should be unset at the end of tasks

2019-05-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-27711.
---
   Resolution: Fixed
 Assignee: Jose Torres
Fix Version/s: 3.0.0
   2.4.4

This is resolved via 
- https://github.com/apache/spark/pull/24605 (master)
- https://github.com/apache/spark/pull/24690 (branch-2.4)

> InputFileBlockHolder should be unset at the end of tasks
> 
>
> Key: SPARK-27711
> URL: https://issues.apache.org/jira/browse/SPARK-27711
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 2.4.3
>Reporter: Jose Torres
>Assignee: Jose Torres
>Priority: Major
> Fix For: 2.4.4, 3.0.0
>
>
> InputFileBlockHolder should be unset at the end of each task. Otherwise the 
> value of input_file_name() can leak over to other tasks instead of beginning 
> as empty string.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27841) Improve UTF8String fromString()/toString()/numChars() performance when strings are ASCII

2019-05-25 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-27841:
--

 Summary: Improve UTF8String fromString()/toString()/numChars() 
performance when strings are ASCII
 Key: SPARK-27841
 URL: https://issues.apache.org/jira/browse/SPARK-27841
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Josh Rosen
Assignee: Josh Rosen


UTF8String's fromString(), toString(), and numChars() methods are performance 
hotspots. For strings which consist entirely of ASCII characters we can make 
performance optimizations which significantly reduce memory allocation and 
copying, greatly improving performance for many common workloads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27801) InMemoryFileIndex.listLeafFiles should use listLocatedStatus for DistributedFileSystem

2019-05-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-27801.
---
   Resolution: Fixed
 Assignee: Rob Russo
Fix Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/24672

> InMemoryFileIndex.listLeafFiles should use listLocatedStatus for 
> DistributedFileSystem
> --
>
> Key: SPARK-27801
> URL: https://issues.apache.org/jira/browse/SPARK-27801
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Rob Russo
>Assignee: Rob Russo
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently in InMemoryFileIndex, all directory listings are done using 
> FileSystem.listStatus following by individual calls to 
> FileSystem.getFileBlockLocations. This is painstakingly slow for folders that 
> have large numbers of files because this process happens serially and 
> parallelism is only applied at the folder level, not the file level.
> FileSystem also provides another API listLocatedStatus which returns the 
> LocatedFileStatus objects that already have the block locations. In 
> FileSystem main class this just delegates to listStatus and 
> getFileBlockLocations similarly to the way Spark does it. However when HDFS 
> specifically is the backing file system, DistributedFileSystem overrides this 
> method and simply makes one single call to the namenode to retrieve the 
> directory listing with the block locations. This avoids potentially thousands 
> or more calls to namenode and also is more consistent because files will 
> either exist with locations or not exist instead of having the 
> FileNotFoundException exception case. 
> For our example directory with 6500 files, the load time of 
> spark.read.parquet was reduced 96x from 76 seconds to .8 seconds. This 
> savings only goes up with the number of files in the directory.
> In the pull request instead of using this method always which could lead to a 
> FileNotFoundException that could be tough to decipher in the default 
> FileSystem implementation, this method is only used when the FileSystem is a 
> DistributedFileSystem and otherwise the old logic still applies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27840) Hadoop attempts to create a temporary folder in root folder

2019-05-25 Thread M. Le Bihan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

M. Le Bihan resolved SPARK-27840.
-
Resolution: Not A Bug

Was mine own bug...

> Hadoop attempts to create a temporary folder in root folder
> ---
>
> Key: SPARK-27840
> URL: https://issues.apache.org/jira/browse/SPARK-27840
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: M. Le Bihan
>Priority: Major
>
> I have a REST web-service that calls a Spring-boot service.
>  
> {code:java}
>    /**
>     * Exporter les comptes de résultats et activités par niveau NAF, pour une 
> série d'intercommunalités, dans un fichier CSV.
>     * @param anneeCOG Année du COG.
>     * @param anneeSIRENE Année des données SIRENE à prendre en considération, 
> pour l'extraction des données entreprise/établissement.
>     * @param anneeComptesResultats Année des données Comptes de résultats à 
> prendre en considération.
>     * @param niveauNAF Niveau NAF de regroupement.
>     * @param codesIntercommunalites Code EPCI / SIREN des intercommunalités.
>     * @return Fichier d'exportation des comptes de résultats et activités 
> majeures des communes par niveau NAF.
>     * @throws IntercommunaliteAbsenteDansCommuneSiegeException si 
> l'intercommunalité désirée n'existe pas. 
>     * @throws TechniqueException si un incident survient.
>     * @throws IOException 
>     */
>    @RequestMapping("/activites/communes/exporterActivitesIntercommunalites")
>    @Produces("text/csv")
>    public String exporterCSV(@RequestParam(name="anneeCOG") int anneeCOG, 
> @RequestParam(name="anneeSIRENE") int anneeSIRENE, 
>  @RequestParam(name="anneeComptesResultats") int 
> anneeComptesResultats, @RequestParam(name="niveauNAF") int niveauNAF, 
>  @RequestParam(name="codesIntercommunalites") String[] 
> codesIntercommunalites) throws 
> IntercommunaliteAbsenteDansCommuneSiegeException, TechniqueException, 
> IOException {
>   SIRENCommune[] sirenIntercommunalites = new 
> SIRENCommune[codesIntercommunalites.length];
>   
>   for(int index=0; index < codesIntercommunalites.length; index ++) {
>  sirenIntercommunalites[index] = new 
> SIRENCommune(codesIntercommunalites[index]);
>   }
>   
>   File tempCSV = new 
> File(this.environnement.getProperty("java.io.tmpdir") + 
> MessageFormat.format("{0,number,#0}", System.currentTimeMillis()));
>   File sortieCSV = 
> this.impactActivitesCommunalesService.exporterCSV(tempCSV, anneeCOG, 
> anneeSIRENE, anneeComptesResultats, niveauNAF, sirenIntercommunalites);
>   
>   StringBuilder contenuCSV = new StringBuilder();
>   
>   try(Stream stream = Files.lines(sortieCSV.toPath(), 
> StandardCharsets.UTF_8)) {
>  stream.forEach(s -> contenuCSV.append(s).append("\n"));
>   }
>   return contenuCSV.toString();
>    }{code}
>  
> The Spring service create a Dataset, and then a CSV file from it, and return 
> that CSV to the rest web-service (it will have only 40 - 50 lines).
>  
> {code:java}
>    /**
>     * Exporter les comptes de résultats et activités par niveau NAF, pour une 
> série d'intercommunalités, dans un fichier CSV.
>     * @param anneeCOG Année du COG.
>     * @param anneeSIRENE Année des données SIRENE à prendre en considération, 
> pour l'extraction des données entreprise/établissement.
>     * @param anneeComptesResultats Année des données Comptes de résultats à 
> prendre en considération.
>     * @param niveauNAF Niveau NAF de regroupement.
>     * @param codesIntercommunalites Code EPCI / SIREN des intercommunalités.
>     * @return Fichier d'exportation des comptes de résultats et activités 
> majeures des communes par niveau NAF.
>     * @throws IntercommunaliteAbsenteDansCommuneSiegeException si 
> l'intercommunalité désirée n'existe pas. 
>     * @throws TechniqueException si un incident survient.
>     */
>    public File exporterCSV(File sortieCSV, int anneeCOG, int anneeSIRENE, int 
> anneeComptesResultats, int niveauNAF, SIRENCommune... codesIntercommunalites) 
> throws IntercommunaliteAbsenteDansCommuneSiegeException, TechniqueException {
>   Objects.requireNonNull(sortieCSV, "Le fichier CSV de sortie ne peut pas 
> valoir null.");
>   
>   JavaPairRDD ActivitesCommunaleParNAF>> intercos = 
> rddActivitesEtComptesResultatsCommunes(anneeCOG, anneeSIRENE, 
> anneeComptesResultats, niveauNAF, codesIntercommunalites);
>   Dataset ds = toDataset(anneeCOG, intercos);
>   ds.coalesce(1).write().mode(SaveMode.Overwrite).option("header", 
> "true").option("quoteMode", "NON_NUMERIC").option("quote", 
> "\"").csv(sortieCSV.getAbsolutePath());
>   
>   // Dresser la liste des fichiers d'extension .csv produits.
>   try {
>  List

[jira] [Created] (SPARK-27840) Hadoop attempts to create a temporary folder in root folder

2019-05-25 Thread M. Le Bihan (JIRA)
M. Le Bihan created SPARK-27840:
---

 Summary: Hadoop attempts to create a temporary folder in root 
folder
 Key: SPARK-27840
 URL: https://issues.apache.org/jira/browse/SPARK-27840
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.3
Reporter: M. Le Bihan


I have a REST web-service that calls a Spring-boot service.

 
{code:java}
   /**
    * Exporter les comptes de résultats et activités par niveau NAF, pour une 
série d'intercommunalités, dans un fichier CSV.
    * @param anneeCOG Année du COG.
    * @param anneeSIRENE Année des données SIRENE à prendre en considération, 
pour l'extraction des données entreprise/établissement.
    * @param anneeComptesResultats Année des données Comptes de résultats à 
prendre en considération.
    * @param niveauNAF Niveau NAF de regroupement.
    * @param codesIntercommunalites Code EPCI / SIREN des intercommunalités.
    * @return Fichier d'exportation des comptes de résultats et activités 
majeures des communes par niveau NAF.
    * @throws IntercommunaliteAbsenteDansCommuneSiegeException si 
l'intercommunalité désirée n'existe pas. 
    * @throws TechniqueException si un incident survient.
    * @throws IOException 
    */
   @RequestMapping("/activites/communes/exporterActivitesIntercommunalites")
   @Produces("text/csv")
   public String exporterCSV(@RequestParam(name="anneeCOG") int anneeCOG, 
@RequestParam(name="anneeSIRENE") int anneeSIRENE, 
 @RequestParam(name="anneeComptesResultats") int anneeComptesResultats, 
@RequestParam(name="niveauNAF") int niveauNAF, 
 @RequestParam(name="codesIntercommunalites") String[] 
codesIntercommunalites) throws 
IntercommunaliteAbsenteDansCommuneSiegeException, TechniqueException, 
IOException {
  SIRENCommune[] sirenIntercommunalites = new 
SIRENCommune[codesIntercommunalites.length];
  
  for(int index=0; index < codesIntercommunalites.length; index ++) {
 sirenIntercommunalites[index] = new 
SIRENCommune(codesIntercommunalites[index]);
  }
  
  File tempCSV = new File(this.environnement.getProperty("java.io.tmpdir") 
+ MessageFormat.format("{0,number,#0}", System.currentTimeMillis()));
  File sortieCSV = 
this.impactActivitesCommunalesService.exporterCSV(tempCSV, anneeCOG, 
anneeSIRENE, anneeComptesResultats, niveauNAF, sirenIntercommunalites);
  
  StringBuilder contenuCSV = new StringBuilder();
  
  try(Stream stream = Files.lines(sortieCSV.toPath(), 
StandardCharsets.UTF_8)) {
 stream.forEach(s -> contenuCSV.append(s).append("\n"));
  }

  return contenuCSV.toString();
   }{code}
 

The Spring service create a Dataset, and then a CSV file from it, and return 
that CSV to the rest web-service (it will have only 40 - 50 lines).

 
{code:java}
   /**
    * Exporter les comptes de résultats et activités par niveau NAF, pour une 
série d'intercommunalités, dans un fichier CSV.
    * @param anneeCOG Année du COG.
    * @param anneeSIRENE Année des données SIRENE à prendre en considération, 
pour l'extraction des données entreprise/établissement.
    * @param anneeComptesResultats Année des données Comptes de résultats à 
prendre en considération.
    * @param niveauNAF Niveau NAF de regroupement.
    * @param codesIntercommunalites Code EPCI / SIREN des intercommunalités.
    * @return Fichier d'exportation des comptes de résultats et activités 
majeures des communes par niveau NAF.
    * @throws IntercommunaliteAbsenteDansCommuneSiegeException si 
l'intercommunalité désirée n'existe pas. 
    * @throws TechniqueException si un incident survient.
    */
   public File exporterCSV(File sortieCSV, int anneeCOG, int anneeSIRENE, int 
anneeComptesResultats, int niveauNAF, SIRENCommune... codesIntercommunalites) 
throws IntercommunaliteAbsenteDansCommuneSiegeException, TechniqueException {
  Objects.requireNonNull(sortieCSV, "Le fichier CSV de sortie ne peut pas 
valoir null.");
  
  JavaPairRDD> intercos = 
rddActivitesEtComptesResultatsCommunes(anneeCOG, anneeSIRENE, 
anneeComptesResultats, niveauNAF, codesIntercommunalites);
  Dataset ds = toDataset(anneeCOG, intercos);
  ds.coalesce(1).write().mode(SaveMode.Overwrite).option("header", 
"true").option("quoteMode", "NON_NUMERIC").option("quote", 
"\"").csv(sortieCSV.getAbsolutePath());
  
  // Dresser la liste des fichiers d'extension .csv produits.
  try {
 List fichiersCSV = Files.walk(sortieCSV.toPath())
   // Rechercher dans le répertoire de sortie
    .map(c -> c.toFile())   
   // les Path convertis en File,
    .filter(c -> c.isDirectory() == false && 
c.getName().endsWith(".csv")) // qui sont des fichiers CSV
    .collect(Collectors.toList());  
   // et les renvoyer en liste.
     
 

[jira] [Commented] (SPARK-27837) Running rand() in SQL with seed of column results in error (rand(col1))

2019-05-25 Thread Jason Ferrell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848200#comment-16848200
 ] 

Jason Ferrell commented on SPARK-27837:
---

Is there any way to get this into an enhancement request?  It doesn't seem 
quite functional as part of the SQL code that I would pass a literal to every 
single row of an operation.  The behavior I think would be most advantageous 
would be to pass an int corresponding to a column within the row so that I can 
get a random value for each row, seeded by a value on that row.  

> Running rand() in SQL with seed of column results in error (rand(col1))
> ---
>
> Key: SPARK-27837
> URL: https://issues.apache.org/jira/browse/SPARK-27837
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Jason Ferrell
>Priority: Major
>
> Running this sql:
> with a as
> (
>  select 123 val1
>  union all
>  select 123 val1
>  union all
>  select 123 val1
> )
> select val1,rand(123),rand(val1)
> from a
> Results in error:  org.apache.spark.sql.AnalysisException: Input argument to 
> rand must be an integer, long or null literal.;
> It doesn't appear to recognize the value of the column as an int.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27837) Running rand() in SQL with seed of column results in error (rand(col1))

2019-05-25 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848125#comment-16848125
 ] 

Liang-Chi Hsieh commented on SPARK-27837:
-

Please see the analysis exception: Input argument to rand must be an integer, 
long or null literal.

> Running rand() in SQL with seed of column results in error (rand(col1))
> ---
>
> Key: SPARK-27837
> URL: https://issues.apache.org/jira/browse/SPARK-27837
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Jason Ferrell
>Priority: Major
>
> Running this sql:
> with a as
> (
>  select 123 val1
>  union all
>  select 123 val1
>  union all
>  select 123 val1
> )
> select val1,rand(123),rand(val1)
> from a
> Results in error:  org.apache.spark.sql.AnalysisException: Input argument to 
> rand must be an integer, long or null literal.;
> It doesn't appear to recognize the value of the column as an int.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27836) Issue with seeded rand() function in Spark SQL

2019-05-25 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848124#comment-16848124
 ] 

Liang-Chi Hsieh commented on SPARK-27836:
-

rand function initializes only once with given seed at beginning when running 
the task. Then it generates the random values.

> Issue with seeded rand() function in Spark SQL
> --
>
> Key: SPARK-27836
> URL: https://issues.apache.org/jira/browse/SPARK-27836
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Jason Ferrell
>Priority: Major
>
> This SQL:
> with a as
> (
>  select 123 val1
>  union all
>  select 123 val1
>  union all
>  select 123 val1
> )
> select val1,rand(123)
> from a
> Results in:
> |val1|rand(123)|
> |123|0.502953|
> |123|0.52307|
> |123|0.853569|
>  
> It should result in three rows all with value 0.502953



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27837) Running rand() in SQL with seed of column results in error (rand(col1))

2019-05-25 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848123#comment-16848123
 ] 

Liang-Chi Hsieh commented on SPARK-27837:
-

The problem isn't that val1 isn't an int, but it isn't a literal. Seed to rand 
must be a literal.

> Running rand() in SQL with seed of column results in error (rand(col1))
> ---
>
> Key: SPARK-27837
> URL: https://issues.apache.org/jira/browse/SPARK-27837
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Jason Ferrell
>Priority: Major
>
> Running this sql:
> with a as
> (
>  select 123 val1
>  union all
>  select 123 val1
>  union all
>  select 123 val1
> )
> select val1,rand(123),rand(val1)
> from a
> Results in error:  org.apache.spark.sql.AnalysisException: Input argument to 
> rand must be an integer, long or null literal.;
> It doesn't appear to recognize the value of the column as an int.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24149) Automatic namespaces discovery in HDFS federation

2019-05-25 Thread Marco Gaido (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848097#comment-16848097
 ] 

Marco Gaido commented on SPARK-24149:
-

[~Dhruve Ashar] the use case for this change, for instance, is when you have a  
partitioned table, when the partitions are on different namespaces and there is 
no viewFS configured. In that case, a user running a query on that table, may 
or may not get an exception when reading it. Please, notice that a user running 
a query may be different from the user creating it, so he/she may also not be 
aware of this situation and understanding what is the problem may be pretty 
hard.

> Automatic namespaces discovery in HDFS federation
> -
>
> Key: SPARK-24149
> URL: https://issues.apache.org/jira/browse/SPARK-24149
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.4.0
>Reporter: Marco Gaido
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 2.4.0
>
>
> Hadoop 3 introduced HDFS federation.
> Spark fails to write on different namespaces when Hadoop federation is turned 
> on and the cluster is secure. This happens because Spark looks for the 
> delegation token only for the defaultFS configured and not for all the 
> available namespaces. A workaround is the usage of the property 
> {{spark.yarn.access.hadoopFileSystems}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27839) Improve UTF8String.replace() / StringReplace performance

2019-05-25 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27839:


Assignee: Josh Rosen  (was: Apache Spark)

> Improve UTF8String.replace() / StringReplace performance
> 
>
> Key: SPARK-27839
> URL: https://issues.apache.org/jira/browse/SPARK-27839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
>
> The UTF8String.replace() function and StringReplace expression are missing a 
> few common-case optimizations, such as avoiding copies when the replacement 
> does not change the string and avoiding redundant copying / decoding of the 
> search and replacement strings in case they are constants.
> I think there's room to significantly improve performance here, especially 
> for single-character replacements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27839) Improve UTF8String.replace() / StringReplace performance

2019-05-25 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27839:


Assignee: Apache Spark  (was: Josh Rosen)

> Improve UTF8String.replace() / StringReplace performance
> 
>
> Key: SPARK-27839
> URL: https://issues.apache.org/jira/browse/SPARK-27839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Josh Rosen
>Assignee: Apache Spark
>Priority: Major
>
> The UTF8String.replace() function and StringReplace expression are missing a 
> few common-case optimizations, such as avoiding copies when the replacement 
> does not change the string and avoiding redundant copying / decoding of the 
> search and replacement strings in case they are constants.
> I think there's room to significantly improve performance here, especially 
> for single-character replacements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27839) Improve UTF8String.replace() / StringReplace performance

2019-05-25 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-27839:
--

 Summary: Improve UTF8String.replace() / StringReplace performance
 Key: SPARK-27839
 URL: https://issues.apache.org/jira/browse/SPARK-27839
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Josh Rosen
Assignee: Josh Rosen


The UTF8String.replace() function and StringReplace expression are missing a 
few common-case optimizations, such as avoiding copies when the replacement 
does not change the string and avoiding redundant copying / decoding of the 
search and replacement strings in case they are constants.

I think there's room to significantly improve performance here, especially for 
single-character replacements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org