[jira] [Updated] (SPARK-23762) UTF8StringBuilder uses MemoryBlock

2018-04-05 Thread Kazuaki Ishizaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-23762:
-
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-23879

> UTF8StringBuilder uses MemoryBlock
> --
>
> Key: SPARK-23762
> URL: https://issues.apache.org/jira/browse/SPARK-23762
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Priority: Major
>
> This JIRA entry tries to use {{MemoryBlock}} in UTF8StringBuffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23763) OffHeapColumnVector uses MemoryBlock

2018-04-05 Thread Kazuaki Ishizaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-23763:
-
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-23879

> OffHeapColumnVector uses MemoryBlock
> 
>
> Key: SPARK-23763
> URL: https://issues.apache.org/jira/browse/SPARK-23763
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Priority: Major
>
> This JIRA entry tries to use {{MemoryBlock}} in {{OffHeapColumnVector}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23779) TaskMemoryManager and UnsafeSorter use MemoryBlock

2018-04-05 Thread Kazuaki Ishizaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-23779:
-
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-23879

> TaskMemoryManager and UnsafeSorter use MemoryBlock
> --
>
> Key: SPARK-23779
> URL: https://issues.apache.org/jira/browse/SPARK-23779
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Priority: Major
>
> This JIRA entry tries to use {{MemoryBlock}} in {TaskMemoryManager} and 
> classes related to {{UnsafeSorter}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23879) Introduce MemoryBlock API instead of Platform API with Object

2018-04-05 Thread Kazuaki Ishizaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-23879:
-
Component/s: Spark Core

> Introduce MemoryBlock API instead of Platform API with Object 
> --
>
> Key: SPARK-23879
> URL: https://issues.apache.org/jira/browse/SPARK-23879
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Priority: Major
>
> This JIRA is derived from SPARK-10399.
> During the discussion, the community revealed that current Spark framework 
> directly accesses several types of memory regions (e.g. {{byte[]}}, 
> {{long[]}}, or {{Off-heap}}) by using {{Platform API with }}{{Object}} type. 
> It would be good to have unified memory management API for clear memory model.
> It is also good for performance. If we can pass typed object (e.g.{{byte[]}}, 
> {{long[]}}) to {{Platform.getXX()/putXX()}}, it is faster than using 
> {{Platform.getXX(Object)/putXX(Object, ...)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23879) Introduce MemoryBlock API instead of Platform API with Object

2018-04-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23879:


Assignee: (was: Apache Spark)

> Introduce MemoryBlock API instead of Platform API with Object 
> --
>
> Key: SPARK-23879
> URL: https://issues.apache.org/jira/browse/SPARK-23879
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Priority: Major
>
> This JIRA is derived from SPARK-10399.
> During the discussion, the community revealed that current Spark framework 
> directly accesses several types of memory regions (e.g. {{byte[]}}, 
> {{long[]}}, or {{Off-heap}}) by using {{Platform API with }}{{Object}} type. 
> It would be good to have unified memory management API for clear memory model.
> It is also good for performance. If we can pass typed object (e.g.{{byte[]}}, 
> {{long[]}}) to {{Platform.getXX()/putXX()}}, it is faster than using 
> {{Platform.getXX(Object)/putXX(Object, ...)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23879) Introduce MemoryBlock API instead of Platform API with Object

2018-04-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23879:


Assignee: Apache Spark

> Introduce MemoryBlock API instead of Platform API with Object 
> --
>
> Key: SPARK-23879
> URL: https://issues.apache.org/jira/browse/SPARK-23879
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Assignee: Apache Spark
>Priority: Major
>
> This JIRA is derived from SPARK-10399.
> During the discussion, the community revealed that current Spark framework 
> directly accesses several types of memory regions (e.g. {{byte[]}}, 
> {{long[]}}, or {{Off-heap}}) by using {{Platform API with }}{{Object}} type. 
> It would be good to have unified memory management API for clear memory model.
> It is also good for performance. If we can pass typed object (e.g.{{byte[]}}, 
> {{long[]}}) to {{Platform.getXX()/putXX()}}, it is faster than using 
> {{Platform.getXX(Object)/putXX(Object, ...)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23879) Introduce MemoryBlock API instead of Platform API with Object

2018-04-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427965#comment-16427965
 ] 

Apache Spark commented on SPARK-23879:
--

User 'kiszk' has created a pull request for this issue:
https://github.com/apache/spark/pull/19222

> Introduce MemoryBlock API instead of Platform API with Object 
> --
>
> Key: SPARK-23879
> URL: https://issues.apache.org/jira/browse/SPARK-23879
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Priority: Major
>
> This JIRA is derived from SPARK-10399.
> During the discussion, the community revealed that current Spark framework 
> directly accesses several types of memory regions (e.g. {{byte[]}}, 
> {{long[]}}, or {{Off-heap}}) by using {{Platform API with }}{{Object}} type. 
> It would be good to have unified memory management API for clear memory model.
> It is also good for performance. If we can pass typed object (e.g.{{byte[]}}, 
> {{long[]}}) to {{Platform.getXX()/putXX()}}, it is faster than using 
> {{Platform.getXX(Object)/putXX(Object, ...)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23879) Introduce MemoryBlock API instead of Platform API with Object

2018-04-05 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-23879:


 Summary: Introduce MemoryBlock API instead of Platform API with 
Object 
 Key: SPARK-23879
 URL: https://issues.apache.org/jira/browse/SPARK-23879
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.0
Reporter: Kazuaki Ishizaki


This JIRA is derived from SPARK-10399.

During the discussion, the community revealed that current Spark framework 
directly accesses several types of memory regions (e.g. {{byte[]}}, {{long[]}}, 
or {{Off-heap}}) by using {{Platform API with }}{{Object}} type. It would be 
good to have unified memory management API for clear memory model.

It is also good for performance. If we can pass typed object (e.g.{{byte[]}}, 
{{long[]}}) to {{Platform.getXX()/putXX()}}, it is faster than using 
{{Platform.getXX(Object)/putXX(Object, ...)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23847) Add asc_nulls_first, asc_nulls_last to PySpark

2018-04-05 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-23847:
-
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-23483

> Add asc_nulls_first, asc_nulls_last to PySpark
> --
>
> Key: SPARK-23847
> URL: https://issues.apache.org/jira/browse/SPARK-23847
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 2.4.0
>Reporter: Huaxin Gao
>Priority: Minor
>
> Column.scala and Functions.scala have asc_nulls_first, asc_nulls_last,  
> desc_nulls_first and desc_nulls_last. Add the corresponding python APIs in 
> PySpark. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23856) Spark jdbc setQueryTimeout option

2018-04-05 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427890#comment-16427890
 ] 

Xiao Li commented on SPARK-23856:
-

Yes, this sounds reasonable. Thanks! Please submit the PR to add it. 

> Spark jdbc setQueryTimeout option
> -
>
> Key: SPARK-23856
> URL: https://issues.apache.org/jira/browse/SPARK-23856
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dmitry Mikhailov
>Priority: Minor
>
> It would be nice if a user could set the jdbc setQueryTimeout option when 
> running jdbc in Spark. I think it can be easily implemented by adding option 
> field to _JDBCOptions_ class and applying this option when initializing jdbc 
> statements in _JDBCRDD_ class. But only some DB vendors support this jdbc 
> feature. Is it worth starting a work on this option?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23878) unable to import col() or lit()

2018-04-05 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427889#comment-16427889
 ] 

Hyukjin Kwon edited comment on SPARK-23878 at 4/6/18 3:34 AM:
--

Hm, I use PyCharm and I have been working fine with that although it shows a 
warning but I was thinking this case is pretty common and not a big deal. Can 
you maybe checkout the configuration in Eclipse? I think it's pretty common to 
define namespaces dynamically and I wonder how Eclipse would work all with them.


was (Author: hyukjin.kwon):
Hm, I use PyCharm and I have been working fine with that although it shows a 
warning, which is pretty common and not a big deal. Can you maybe checkout the 
configuration in Eclipse? I think it's pretty common to define namespaces 
dynamically and I wonder how Eclipse would work all with them.

> unable to import col() or lit()
> ---
>
> Key: SPARK-23878
> URL: https://issues.apache.org/jira/browse/SPARK-23878
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
> Environment: eclipse 4.7.3
> pyDev 6.3.2
> pyspark==2.3.0
>Reporter: Andrew Davidson
>Priority: Major
>
> I have some code I am moving from a jupyter notebook to separate python 
> modules. My notebook uses col() and list() and works fine
> when I try to work with module files in my IDE I get the following errors. I 
> am also not able to run my unit tests.
> {color:#FF}Description Resource Path Location Type{color}
> {color:#FF}Unresolved import: lit load.py 
> /adt_pyDevProj/src/automatedDataTranslation line 22 PyDev Problem{color}
> {color:#FF}Description Resource Path Location Type{color}
> {color:#FF}Unresolved import: col load.py 
> /adt_pyDevProj/src/automatedDataTranslation line 21 PyDev Problem{color}
> I suspect that when you run pyspark it is generating the col and lit 
> functions?
> I found a discription of the problem @ 
> [https://stackoverflow.com/questions/40163106/cannot-find-col-function-in-pyspark]
>  I do not understand how to make this work in my IDE. I am not running 
> pyspark just an editor
> is there some sort of workaround or replacement for these missing functions?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23878) unable to import col() or lit()

2018-04-05 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427889#comment-16427889
 ] 

Hyukjin Kwon commented on SPARK-23878:
--

Hm, I use PyCharm and I have been working fine with that although it shows a 
warning, which is pretty common and not a big deal. Can you maybe checkout the 
configuration in Eclipse? I think it's pretty common to define namespaces 
dynamically and I wonder how Eclipse would work all with them.

> unable to import col() or lit()
> ---
>
> Key: SPARK-23878
> URL: https://issues.apache.org/jira/browse/SPARK-23878
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
> Environment: eclipse 4.7.3
> pyDev 6.3.2
> pyspark==2.3.0
>Reporter: Andrew Davidson
>Priority: Major
>
> I have some code I am moving from a jupyter notebook to separate python 
> modules. My notebook uses col() and list() and works fine
> when I try to work with module files in my IDE I get the following errors. I 
> am also not able to run my unit tests.
> {color:#FF}Description Resource Path Location Type{color}
> {color:#FF}Unresolved import: lit load.py 
> /adt_pyDevProj/src/automatedDataTranslation line 22 PyDev Problem{color}
> {color:#FF}Description Resource Path Location Type{color}
> {color:#FF}Unresolved import: col load.py 
> /adt_pyDevProj/src/automatedDataTranslation line 21 PyDev Problem{color}
> I suspect that when you run pyspark it is generating the col and lit 
> functions?
> I found a discription of the problem @ 
> [https://stackoverflow.com/questions/40163106/cannot-find-col-function-in-pyspark]
>  I do not understand how to make this work in my IDE. I am not running 
> pyspark just an editor
> is there some sort of workaround or replacement for these missing functions?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23856) Spark jdbc setQueryTimeout option

2018-04-05 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-23856:
---

Assignee: (was: Xiao Li)

> Spark jdbc setQueryTimeout option
> -
>
> Key: SPARK-23856
> URL: https://issues.apache.org/jira/browse/SPARK-23856
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dmitry Mikhailov
>Priority: Minor
>
> It would be nice if a user could set the jdbc setQueryTimeout option when 
> running jdbc in Spark. I think it can be easily implemented by adding option 
> field to _JDBCOptions_ class and applying this option when initializing jdbc 
> statements in _JDBCRDD_ class. But only some DB vendors support this jdbc 
> feature. Is it worth starting a work on this option?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23856) Spark jdbc setQueryTimeout option

2018-04-05 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-23856:
---

Assignee: Xiao Li

> Spark jdbc setQueryTimeout option
> -
>
> Key: SPARK-23856
> URL: https://issues.apache.org/jira/browse/SPARK-23856
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dmitry Mikhailov
>Assignee: Xiao Li
>Priority: Minor
>
> It would be nice if a user could set the jdbc setQueryTimeout option when 
> running jdbc in Spark. I think it can be easily implemented by adding option 
> field to _JDBCOptions_ class and applying this option when initializing jdbc 
> statements in _JDBCRDD_ class. But only some DB vendors support this jdbc 
> feature. Is it worth starting a work on this option?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23878) unable to import col() or lit()

2018-04-05 Thread Andrew Davidson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427883#comment-16427883
 ] 

Andrew Davidson commented on SPARK-23878:
-

Hi Hyukjin

you are correct. Most IDE's are primarily language aware editors and builders. 
For example consider eclipse or IntelJ for developing a javascript website, or 
java servlet. The editor functionality knows about the syntax of the language 
you are working with along with the libraries and packages you are using. Often 
the IDE does some sort of continuous build or code analysis to help you find 
bugs without having to deploy 

Often the IDE makes it easy build, package, to actually deploy on some sort of 
test server and debug and or run unit tests.

So if pyspark is generating functions at turn time that going to cause problems 
for the IDE. the functions are not defined in the edit session. 

[http://www.learn4master.com/algorithms/pyspark-unit-test-set-up-sparkcontext] 
describes how to write unititests for pyspark that you can run from your 
command line and or from with in elipse.  I think a side effect is that they 
might cause the functions lit() and col() to be generated?

 

I could not find a work around for col() and lit().

 

    ret = df.select(

                col(columnName).cast("string").alias("key"),

                lit(value).alias("source")

            )

 

Kind regards

 

Andy  

> unable to import col() or lit()
> ---
>
> Key: SPARK-23878
> URL: https://issues.apache.org/jira/browse/SPARK-23878
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
> Environment: eclipse 4.7.3
> pyDev 6.3.2
> pyspark==2.3.0
>Reporter: Andrew Davidson
>Priority: Major
>
> I have some code I am moving from a jupyter notebook to separate python 
> modules. My notebook uses col() and list() and works fine
> when I try to work with module files in my IDE I get the following errors. I 
> am also not able to run my unit tests.
> {color:#FF}Description Resource Path Location Type{color}
> {color:#FF}Unresolved import: lit load.py 
> /adt_pyDevProj/src/automatedDataTranslation line 22 PyDev Problem{color}
> {color:#FF}Description Resource Path Location Type{color}
> {color:#FF}Unresolved import: col load.py 
> /adt_pyDevProj/src/automatedDataTranslation line 21 PyDev Problem{color}
> I suspect that when you run pyspark it is generating the col and lit 
> functions?
> I found a discription of the problem @ 
> [https://stackoverflow.com/questions/40163106/cannot-find-col-function-in-pyspark]
>  I do not understand how to make this work in my IDE. I am not running 
> pyspark just an editor
> is there some sort of workaround or replacement for these missing functions?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23817) Migrate ORC file format read path to data source V2

2018-04-05 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-23817.
-
   Resolution: Fixed
 Assignee: Gengliang Wang
Fix Version/s: 2.4.0

> Migrate ORC file format read path to data source V2
> ---
>
> Key: SPARK-23817
> URL: https://issues.apache.org/jira/browse/SPARK-23817
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 2.4.0
>
>
> Migrate ORC file format read path to data source V2. 
> Supports:
>  # Scan ColumnarBatch
>  # Scan UnsafeRow
>  # Push down filters
>  # Push down required columns
> Not supported( due to limitation of data source V2):
>  # Read multiple file path
>  # Read bucketed file.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23863) Wholetext mode should not add line breaks

2018-04-05 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-23863:
-
Fix Version/s: (was: 2.3.1)

> Wholetext mode should not add line breaks
> -
>
> Key: SPARK-23863
> URL: https://issues.apache.org/jira/browse/SPARK-23863
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: chencheng
>Priority: Major
>
> We merged multiple files in the Text format. When using the select count(1) 
> to count the merged results, we found that the result of the merged wholetext 
> mode was inconsistent with that of the normal mode.
> E.g:
>  Combine 10 text files with a total of 100 lines. When the wholeTextMode 
> parameter of the buildReader method is set to true, the result of select 
> count(1) will be 110.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23863) Wholetext mode should not add line breaks

2018-04-05 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-23863.
--
Resolution: Invalid

> Wholetext mode should not add line breaks
> -
>
> Key: SPARK-23863
> URL: https://issues.apache.org/jira/browse/SPARK-23863
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: chencheng
>Priority: Major
> Fix For: 2.3.1
>
>
> We merged multiple files in the Text format. When using the select count(1) 
> to count the merged results, we found that the result of the merged wholetext 
> mode was inconsistent with that of the normal mode.
> E.g:
>  Combine 10 text files with a total of 100 lines. When the wholeTextMode 
> parameter of the buildReader method is set to true, the result of select 
> count(1) will be 110.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23823) ResolveReferences loses correct origin

2018-04-05 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-23823.
-
   Resolution: Fixed
 Assignee: Jiahui Jiang
Fix Version/s: 2.4.0
   2.3.1

> ResolveReferences loses correct origin
> --
>
> Key: SPARK-23823
> URL: https://issues.apache.org/jira/browse/SPARK-23823
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Jiahui Jiang
>Assignee: Jiahui Jiang
>Priority: Major
> Fix For: 2.3.1, 2.4.0
>
>
> Introduced in [https://github.com/apache/spark/pull/19585]
> ResolveReferences stopped doing transfromsUp after this change and Attributes 
> sometimes lose its correct origin



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23870) Forward RFormula handleInvalid Param to VectorAssembler

2018-04-05 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-23870.
---
   Resolution: Fixed
Fix Version/s: 2.4.0

Resolved via https://github.com/apache/spark/pull/20970

>  Forward RFormula handleInvalid Param to VectorAssembler
> 
>
> Key: SPARK-23870
> URL: https://issues.apache.org/jira/browse/SPARK-23870
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.3.0
>Reporter: yogesh garg
>Assignee: yogesh garg
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23870) Forward RFormula handleInvalid Param to VectorAssembler

2018-04-05 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley reassigned SPARK-23870:
-

Assignee: yogesh garg

>  Forward RFormula handleInvalid Param to VectorAssembler
> 
>
> Key: SPARK-23870
> URL: https://issues.apache.org/jira/browse/SPARK-23870
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.3.0
>Reporter: yogesh garg
>Assignee: yogesh garg
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23870) Forward RFormula handleInvalid Param to VectorAssembler

2018-04-05 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-23870:
--
Fix Version/s: (was: 2.4.0)

>  Forward RFormula handleInvalid Param to VectorAssembler
> 
>
> Key: SPARK-23870
> URL: https://issues.apache.org/jira/browse/SPARK-23870
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.3.0
>Reporter: yogesh garg
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23878) unable to import col() or lit()

2018-04-05 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427812#comment-16427812
 ] 

Hyukjin Kwon commented on SPARK-23878:
--

So, just to be clear, it works fine but IDE doesn't detect them since they were 
defined dynamically?

> unable to import col() or lit()
> ---
>
> Key: SPARK-23878
> URL: https://issues.apache.org/jira/browse/SPARK-23878
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
> Environment: eclipse 4.7.3
> pyDev 6.3.2
> pyspark==2.3.0
>Reporter: Andrew Davidson
>Priority: Major
>
> I have some code I am moving from a jupyter notebook to separate python 
> modules. My notebook uses col() and list() and works fine
> when I try to work with module files in my IDE I get the following errors. I 
> am also not able to run my unit tests.
> {color:#FF}Description Resource Path Location Type{color}
> {color:#FF}Unresolved import: lit load.py 
> /adt_pyDevProj/src/automatedDataTranslation line 22 PyDev Problem{color}
> {color:#FF}Description Resource Path Location Type{color}
> {color:#FF}Unresolved import: col load.py 
> /adt_pyDevProj/src/automatedDataTranslation line 21 PyDev Problem{color}
> I suspect that when you run pyspark it is generating the col and lit 
> functions?
> I found a discription of the problem @ 
> [https://stackoverflow.com/questions/40163106/cannot-find-col-function-in-pyspark]
>  I do not understand how to make this work in my IDE. I am not running 
> pyspark just an editor
> is there some sort of workaround or replacement for these missing functions?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13587) Support virtualenv in PySpark

2018-04-05 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-13587:


Assignee: Jeff Zhang

> Support virtualenv in PySpark
> -
>
> Key: SPARK-13587
> URL: https://issues.apache.org/jira/browse/SPARK-13587
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>Priority: Major
>
> Currently, it's not easy for user to add third party python packages in 
> pyspark.
> * One way is to using --py-files (suitable for simple dependency, but not 
> suitable for complicated dependency, especially with transitive dependency)
> * Another way is install packages manually on each node (time wasting, and 
> not easy to switch to different environment)
> Python has now 2 different virtualenv implementation. One is native 
> virtualenv another is through conda. This jira is trying to migrate these 2 
> tools to distributed environment



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23878) unable to import col() or lit()

2018-04-05 Thread Andrew Davidson (JIRA)
Andrew Davidson created SPARK-23878:
---

 Summary: unable to import col() or lit()
 Key: SPARK-23878
 URL: https://issues.apache.org/jira/browse/SPARK-23878
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 2.3.0
 Environment: eclipse 4.7.3

pyDev 6.3.2

pyspark==2.3.0
Reporter: Andrew Davidson


I have some code I am moving from a jupyter notebook to separate python 
modules. My notebook uses col() and list() and works fine

when I try to work with module files in my IDE I get the following errors. I am 
also not able to run my unit tests.

{color:#FF}Description Resource Path Location Type{color}
{color:#FF}Unresolved import: lit load.py 
/adt_pyDevProj/src/automatedDataTranslation line 22 PyDev Problem{color}

{color:#FF}Description Resource Path Location Type{color}
{color:#FF}Unresolved import: col load.py 
/adt_pyDevProj/src/automatedDataTranslation line 21 PyDev Problem{color}

I suspect that when you run pyspark it is generating the col and lit functions?

I found a discription of the problem @ 
[https://stackoverflow.com/questions/40163106/cannot-find-col-function-in-pyspark]
 I do not understand how to make this work in my IDE. I am not running pyspark 
just an editor

is there some sort of workaround or replacement for these missing functions?

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23874) Upgrade apache/arrow to 0.9.0

2018-04-05 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427791#comment-16427791
 ] 

Hyukjin Kwon commented on SPARK-23874:
--

cc [~ueshin] too

> Upgrade apache/arrow to 0.9.0
> -
>
> Key: SPARK-23874
> URL: https://issues.apache.org/jira/browse/SPARK-23874
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Bryan Cutler
>Priority: Major
>
> Version 0.9.0 of apache arrow comes with a bug fix related to array 
> serialization. 
> https://issues.apache.org/jira/browse/ARROW-1973



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23529) Specify hostpath volume and mount the volume in Spark driver and executor pods in Kubernetes

2018-04-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427746#comment-16427746
 ] 

Apache Spark commented on SPARK-23529:
--

User 'madanadit' has created a pull request for this issue:
https://github.com/apache/spark/pull/20989

> Specify hostpath volume and mount the volume in Spark driver and executor 
> pods in Kubernetes
> 
>
> Key: SPARK-23529
> URL: https://issues.apache.org/jira/browse/SPARK-23529
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Suman Somasundar
>Assignee: Anirudh Ramanathan
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23529) Specify hostpath volume and mount the volume in Spark driver and executor pods in Kubernetes

2018-04-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23529:


Assignee: Anirudh Ramanathan  (was: Apache Spark)

> Specify hostpath volume and mount the volume in Spark driver and executor 
> pods in Kubernetes
> 
>
> Key: SPARK-23529
> URL: https://issues.apache.org/jira/browse/SPARK-23529
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Suman Somasundar
>Assignee: Anirudh Ramanathan
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23529) Specify hostpath volume and mount the volume in Spark driver and executor pods in Kubernetes

2018-04-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23529:


Assignee: Apache Spark  (was: Anirudh Ramanathan)

> Specify hostpath volume and mount the volume in Spark driver and executor 
> pods in Kubernetes
> 
>
> Key: SPARK-23529
> URL: https://issues.apache.org/jira/browse/SPARK-23529
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Suman Somasundar
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23877) Metadata-only queries do not push down filter conditions

2018-04-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23877:


Assignee: Apache Spark

> Metadata-only queries do not push down filter conditions
> 
>
> Key: SPARK-23877
> URL: https://issues.apache.org/jira/browse/SPARK-23877
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Ryan Blue
>Assignee: Apache Spark
>Priority: Major
> Fix For: 2.4.0
>
>
> Metadata-only queries currently load all partitions in a table instead of 
> passing filter conditions to list only matching partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23877) Metadata-only queries do not push down filter conditions

2018-04-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23877:


Assignee: (was: Apache Spark)

> Metadata-only queries do not push down filter conditions
> 
>
> Key: SPARK-23877
> URL: https://issues.apache.org/jira/browse/SPARK-23877
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Ryan Blue
>Priority: Major
> Fix For: 2.4.0
>
>
> Metadata-only queries currently load all partitions in a table instead of 
> passing filter conditions to list only matching partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23877) Metadata-only queries do not push down filter conditions

2018-04-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427743#comment-16427743
 ] 

Apache Spark commented on SPARK-23877:
--

User 'rdblue' has created a pull request for this issue:
https://github.com/apache/spark/pull/20988

> Metadata-only queries do not push down filter conditions
> 
>
> Key: SPARK-23877
> URL: https://issues.apache.org/jira/browse/SPARK-23877
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Ryan Blue
>Priority: Major
> Fix For: 2.4.0
>
>
> Metadata-only queries currently load all partitions in a table instead of 
> passing filter conditions to list only matching partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23877) Metadata-only queries do not push down filter conditions

2018-04-05 Thread Ryan Blue (JIRA)
Ryan Blue created SPARK-23877:
-

 Summary: Metadata-only queries do not push down filter conditions
 Key: SPARK-23877
 URL: https://issues.apache.org/jira/browse/SPARK-23877
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.0
Reporter: Ryan Blue
 Fix For: 2.4.0


Metadata-only queries currently load all partitions in a table instead of 
passing filter conditions to list only matching partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23874) Upgrade apache/arrow to 0.9.0

2018-04-05 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-23874:
---

Assignee: Bryan Cutler

> Upgrade apache/arrow to 0.9.0
> -
>
> Key: SPARK-23874
> URL: https://issues.apache.org/jira/browse/SPARK-23874
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Bryan Cutler
>Priority: Major
>
> Version 0.9.0 of apache arrow comes with a bug fix related to array 
> serialization. 
> https://issues.apache.org/jira/browse/ARROW-1973



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23874) Upgrade apache/arrow to 0.9.0

2018-04-05 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427717#comment-16427717
 ] 

Xiao Li commented on SPARK-23874:
-

Thank you! [~bryanc][~hyukjin.kwon][~shaneknapp]

> Upgrade apache/arrow to 0.9.0
> -
>
> Key: SPARK-23874
> URL: https://issues.apache.org/jira/browse/SPARK-23874
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Bryan Cutler
>Priority: Major
>
> Version 0.9.0 of apache arrow comes with a bug fix related to array 
> serialization. 
> https://issues.apache.org/jira/browse/ARROW-1973



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23874) Upgrade apache/arrow to 0.9.0

2018-04-05 Thread Bryan Cutler (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427712#comment-16427712
 ] 

Bryan Cutler commented on SPARK-23874:
--

I can work on this.  I wasn't able to recreate the linked issue within Spark, 
but maybe I just wasn't hitting the right place so it would be good to upgrade 
anyway.  I'll try to get the code changes done next week and the help 
coordinate the Jenkins upgrades.

> Upgrade apache/arrow to 0.9.0
> -
>
> Key: SPARK-23874
> URL: https://issues.apache.org/jira/browse/SPARK-23874
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Version 0.9.0 of apache arrow comes with a bug fix related to array 
> serialization. 
> https://issues.apache.org/jira/browse/ARROW-1973



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21274) Implement EXCEPT ALL and INTERSECT ALL

2018-04-05 Thread Dilip Biswal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427689#comment-16427689
 ] 

Dilip Biswal commented on SPARK-21274:
--

Thank you [~smilegator]

Here is the link.
https://drive.google.com/open?id=1nyW0T0b_ajUduQoPgZLAsyHK8s3_dko3ulQuxaLpUXE

> Implement EXCEPT ALL and INTERSECT ALL
> --
>
> Key: SPARK-21274
> URL: https://issues.apache.org/jira/browse/SPARK-21274
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0, 2.2.0
>Reporter: Ruslan Dautkhanov
>Priority: Major
>
> 1) *EXCEPT ALL* / MINUS ALL :
> {code}
> SELECT a,b,c FROM tab1
>  EXCEPT ALL 
> SELECT a,b,c FROM tab2
> {code}
> can be rewritten as following outer join:
> {code}
> SELECT a,b,c
> FROMtab1 t1
>  LEFT OUTER JOIN 
> tab2 t2
>  ON (
> (t1.a, t1.b, t1.c) = (t2.a, t2.b, t2.c)
>  )
> WHERE
> COALESCE(t2.a, t2.b, t2.c) IS NULL
> {code}
> (register as a temp.view this second query under "*t1_except_t2_df*" name 
> that can be also used to find INTERSECT ALL below):
> 2) *INTERSECT ALL*:
> {code}
> SELECT a,b,c FROM tab1
>  INTERSECT ALL 
> SELECT a,b,c FROM tab2
> {code}
> can be rewritten as following anti-join using t1_except_t2_df we defined 
> above:
> {code}
> SELECT a,b,c
> FROMtab1 t1
> WHERE 
>NOT EXISTS
>(SELECT 1
> FROMt1_except_t2_df e
> WHERE (t1.a, t1.b, t1.c) = (e.a, e.b, e.c)
>)
> {code}
> So the suggestion is just to use above query rewrites to implement both 
> EXCEPT ALL and INTERSECT ALL sql set operations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23529) Specify hostpath volume and mount the volume in Spark driver and executor pods in Kubernetes

2018-04-05 Thread Adit Madan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427650#comment-16427650
 ] 

Adit Madan edited comment on SPARK-23529 at 4/5/18 9:50 PM:


Hi [~foxish], I also have a use case for using hostpath volumes and would be 
happy to contribute the implementation. 

Summary: Enable short-circuit writes to distributed storage on k8s.

The Alluxio File System uses domain sockets to enable short-circuit writes from 
the client to worker memory when co-located on the same host machine. A 
directory, lets say /tmp/domain on the host, is mounted on the Alluxio worker 
container as well as the Alluxio client ( = Spark executor) container. The 
worker creates a domain socket /tmp/domain/d and if the client container mounts 
the same directory, it can write directly to the Alluxio worker w/o passing 
through network stack. The end result is faster data access when data is local.

Appreciate your thoughts on this!. I have an implementation ready exposing a 
new property spark.kubernetes.executor.volumes taking the value of the form 
hostPath:containerPath[:ro|rw].  


was (Author: madanadit):
Hi [~foxish], I also have a use case for using hostpath volumes and would be 
happy to contribute the implementation. 

Summary: Enable short-circuit writes to distributed storage on k8s.

The Alluxio File System uses domain sockets to enable short-circuit writes from 
the client to worker memory when co-located on the same host machine. A 
directory, lets say /tmp/domain on the host, is mounted on the Alluxio worker 
container as well as the Alluxio client ( = Spark executor) container. The 
worker creates a domain socket /tmp/domain/d and if the client container mounts 
the same directory, it can write directory to the Alluxio worker w/o passing 
through network stack. The end result is faster data access when data is local.

Appreciate your thoughts on this!. I have an implementation ready exposing a 
new property spark.kubernetes.executor.volumes taking the value of the form 
hostPath:containerPath[:ro|rw].  

> Specify hostpath volume and mount the volume in Spark driver and executor 
> pods in Kubernetes
> 
>
> Key: SPARK-23529
> URL: https://issues.apache.org/jira/browse/SPARK-23529
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Suman Somasundar
>Assignee: Anirudh Ramanathan
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23529) Specify hostpath volume and mount the volume in Spark driver and executor pods in Kubernetes

2018-04-05 Thread Adit Madan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427650#comment-16427650
 ] 

Adit Madan commented on SPARK-23529:


Hi [~foxish], I also have a use case for using hostpath volumes and would be 
happy to contribute the implementation. 

Summary: Enable short-circuit writes to distributed storage on k8s.

The Alluxio File System uses domain sockets to enable short-circuit writes from 
the client to worker memory when co-located on the same host machine. A 
directory, lets say /tmp/domain on the host, is mounted on the Alluxio worker 
container as well as the Alluxio client ( = Spark executor) container. The 
worker creates a domain socket /tmp/domain/d and if the client container mounts 
the same directory, it can write directory to the Alluxio worker w/o passing 
through network stack. The end result is faster data access when data is local.

Appreciate your thoughts on this!. I have an implementation ready exposing a 
new property spark.kubernetes.executor.volumes taking the value of the form 
hostPath:containerPath[:ro|rw].  

> Specify hostpath volume and mount the volume in Spark driver and executor 
> pods in Kubernetes
> 
>
> Key: SPARK-23529
> URL: https://issues.apache.org/jira/browse/SPARK-23529
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Suman Somasundar
>Assignee: Anirudh Ramanathan
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23816) FetchFailedException when killing speculative task

2018-04-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23816:


Assignee: (was: Apache Spark)

> FetchFailedException when killing speculative task
> --
>
> Key: SPARK-23816
> URL: https://issues.apache.org/jira/browse/SPARK-23816
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: chen xiao
>Priority: Major
>
> When spark trying to kill speculative tasks because of another attempt has 
> already success, sometimes the task throws 
> "org.apache.spark.shuffle.FetchFailedException: Error in opening 
> FileSegmentManagedBuffer" and the whole stage will fail.
> Other active stages will also fail with error 
> "org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output 
> location for shuffle" Then I checked the log in failed executor, there is not 
> error like "MetadataFetchFailedException". So they just failed with no error.
> {code:java}
> 18/03/26 23:12:09 INFO Executor: Executor is trying to kill task 2879.1 in 
> stage 4.0 (TID 13023), reason: another attempt succeeded
> 18/03/26 23:12:09 ERROR ShuffleBlockFetcherIterator: Failed to create input 
> stream from local block
> java.io.IOException: Error in opening 
> FileSegmentManagedBuffer{file=/hadoop02/yarn/local/usercache/pp_risk_grs_datamart_batch/appcache/application_1521504416249_116088/blockmgr-754a22fd-e8d6-4478-bcf8-f1d95f07f4a2/0c/shuffle_24_10_0.data,
>  offset=263687568, length=87231}
>   at 
> org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:114)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:401)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:61)
>   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:104)
>   at 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:103)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>   at org.apache.spark.scheduler.Task.run(Task.scala:108)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.channels.ClosedByInterruptException
>   at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>   at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)
>   at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
>   at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
>   at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
>   at java.io.InputStream.skip(InputStream.java:224)
>   at 
> org.spark_project.guava.io.ByteStreams.skipFully(ByteStreams.java:755)
>   at 
> 

[jira] [Commented] (SPARK-23816) FetchFailedException when killing speculative task

2018-04-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427633#comment-16427633
 ] 

Apache Spark commented on SPARK-23816:
--

User 'squito' has created a pull request for this issue:
https://github.com/apache/spark/pull/20987

> FetchFailedException when killing speculative task
> --
>
> Key: SPARK-23816
> URL: https://issues.apache.org/jira/browse/SPARK-23816
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: chen xiao
>Priority: Major
>
> When spark trying to kill speculative tasks because of another attempt has 
> already success, sometimes the task throws 
> "org.apache.spark.shuffle.FetchFailedException: Error in opening 
> FileSegmentManagedBuffer" and the whole stage will fail.
> Other active stages will also fail with error 
> "org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output 
> location for shuffle" Then I checked the log in failed executor, there is not 
> error like "MetadataFetchFailedException". So they just failed with no error.
> {code:java}
> 18/03/26 23:12:09 INFO Executor: Executor is trying to kill task 2879.1 in 
> stage 4.0 (TID 13023), reason: another attempt succeeded
> 18/03/26 23:12:09 ERROR ShuffleBlockFetcherIterator: Failed to create input 
> stream from local block
> java.io.IOException: Error in opening 
> FileSegmentManagedBuffer{file=/hadoop02/yarn/local/usercache/pp_risk_grs_datamart_batch/appcache/application_1521504416249_116088/blockmgr-754a22fd-e8d6-4478-bcf8-f1d95f07f4a2/0c/shuffle_24_10_0.data,
>  offset=263687568, length=87231}
>   at 
> org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:114)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:401)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:61)
>   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:104)
>   at 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:103)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>   at org.apache.spark.scheduler.Task.run(Task.scala:108)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.channels.ClosedByInterruptException
>   at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>   at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)
>   at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
>   at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
>   at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
>   at java.io.InputStream.skip(InputStream.java:224)
>   at 
> org.spark_project.guava.io.ByteStreams.skipFully(ByteStreams.java:755)
>   at 
> 

[jira] [Assigned] (SPARK-23816) FetchFailedException when killing speculative task

2018-04-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23816:


Assignee: Apache Spark

> FetchFailedException when killing speculative task
> --
>
> Key: SPARK-23816
> URL: https://issues.apache.org/jira/browse/SPARK-23816
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: chen xiao
>Assignee: Apache Spark
>Priority: Major
>
> When spark trying to kill speculative tasks because of another attempt has 
> already success, sometimes the task throws 
> "org.apache.spark.shuffle.FetchFailedException: Error in opening 
> FileSegmentManagedBuffer" and the whole stage will fail.
> Other active stages will also fail with error 
> "org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output 
> location for shuffle" Then I checked the log in failed executor, there is not 
> error like "MetadataFetchFailedException". So they just failed with no error.
> {code:java}
> 18/03/26 23:12:09 INFO Executor: Executor is trying to kill task 2879.1 in 
> stage 4.0 (TID 13023), reason: another attempt succeeded
> 18/03/26 23:12:09 ERROR ShuffleBlockFetcherIterator: Failed to create input 
> stream from local block
> java.io.IOException: Error in opening 
> FileSegmentManagedBuffer{file=/hadoop02/yarn/local/usercache/pp_risk_grs_datamart_batch/appcache/application_1521504416249_116088/blockmgr-754a22fd-e8d6-4478-bcf8-f1d95f07f4a2/0c/shuffle_24_10_0.data,
>  offset=263687568, length=87231}
>   at 
> org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:114)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:401)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:61)
>   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:104)
>   at 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:103)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>   at org.apache.spark.scheduler.Task.run(Task.scala:108)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.channels.ClosedByInterruptException
>   at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>   at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)
>   at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
>   at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
>   at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
>   at java.io.InputStream.skip(InputStream.java:224)
>   at 
> org.spark_project.guava.io.ByteStreams.skipFully(ByteStreams.java:755)
>   at 
> 

[jira] [Commented] (SPARK-21274) Implement EXCEPT ALL and INTERSECT ALL

2018-04-05 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427603#comment-16427603
 ] 

Xiao Li commented on SPARK-21274:
-

[~dkbiswal] [~ioana-delaney] Could you post your design doc here?

> Implement EXCEPT ALL and INTERSECT ALL
> --
>
> Key: SPARK-21274
> URL: https://issues.apache.org/jira/browse/SPARK-21274
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0, 2.2.0
>Reporter: Ruslan Dautkhanov
>Priority: Major
>
> 1) *EXCEPT ALL* / MINUS ALL :
> {code}
> SELECT a,b,c FROM tab1
>  EXCEPT ALL 
> SELECT a,b,c FROM tab2
> {code}
> can be rewritten as following outer join:
> {code}
> SELECT a,b,c
> FROMtab1 t1
>  LEFT OUTER JOIN 
> tab2 t2
>  ON (
> (t1.a, t1.b, t1.c) = (t2.a, t2.b, t2.c)
>  )
> WHERE
> COALESCE(t2.a, t2.b, t2.c) IS NULL
> {code}
> (register as a temp.view this second query under "*t1_except_t2_df*" name 
> that can be also used to find INTERSECT ALL below):
> 2) *INTERSECT ALL*:
> {code}
> SELECT a,b,c FROM tab1
>  INTERSECT ALL 
> SELECT a,b,c FROM tab2
> {code}
> can be rewritten as following anti-join using t1_except_t2_df we defined 
> above:
> {code}
> SELECT a,b,c
> FROMtab1 t1
> WHERE 
>NOT EXISTS
>(SELECT 1
> FROMt1_except_t2_df e
> WHERE (t1.a, t1.b, t1.c) = (e.a, e.b, e.c)
>)
> {code}
> So the suggestion is just to use above query rewrites to implement both 
> EXCEPT ALL and INTERSECT ALL sql set operations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21274) Implement EXCEPT ALL and INTERSECT ALL

2018-04-05 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21274:

Target Version/s: 2.4.0, 3.0.0  (was: 2.4.0)

> Implement EXCEPT ALL and INTERSECT ALL
> --
>
> Key: SPARK-21274
> URL: https://issues.apache.org/jira/browse/SPARK-21274
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0, 2.2.0
>Reporter: Ruslan Dautkhanov
>Priority: Major
>
> 1) *EXCEPT ALL* / MINUS ALL :
> {code}
> SELECT a,b,c FROM tab1
>  EXCEPT ALL 
> SELECT a,b,c FROM tab2
> {code}
> can be rewritten as following outer join:
> {code}
> SELECT a,b,c
> FROMtab1 t1
>  LEFT OUTER JOIN 
> tab2 t2
>  ON (
> (t1.a, t1.b, t1.c) = (t2.a, t2.b, t2.c)
>  )
> WHERE
> COALESCE(t2.a, t2.b, t2.c) IS NULL
> {code}
> (register as a temp.view this second query under "*t1_except_t2_df*" name 
> that can be also used to find INTERSECT ALL below):
> 2) *INTERSECT ALL*:
> {code}
> SELECT a,b,c FROM tab1
>  INTERSECT ALL 
> SELECT a,b,c FROM tab2
> {code}
> can be rewritten as following anti-join using t1_except_t2_df we defined 
> above:
> {code}
> SELECT a,b,c
> FROMtab1 t1
> WHERE 
>NOT EXISTS
>(SELECT 1
> FROMt1_except_t2_df e
> WHERE (t1.a, t1.b, t1.c) = (e.a, e.b, e.c)
>)
> {code}
> So the suggestion is just to use above query rewrites to implement both 
> EXCEPT ALL and INTERSECT ALL sql set operations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21274) Implement EXCEPT ALL and INTERSECT ALL

2018-04-05 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21274:

Labels:   (was: set sql)

> Implement EXCEPT ALL and INTERSECT ALL
> --
>
> Key: SPARK-21274
> URL: https://issues.apache.org/jira/browse/SPARK-21274
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0, 2.2.0
>Reporter: Ruslan Dautkhanov
>Priority: Major
>
> 1) *EXCEPT ALL* / MINUS ALL :
> {code}
> SELECT a,b,c FROM tab1
>  EXCEPT ALL 
> SELECT a,b,c FROM tab2
> {code}
> can be rewritten as following outer join:
> {code}
> SELECT a,b,c
> FROMtab1 t1
>  LEFT OUTER JOIN 
> tab2 t2
>  ON (
> (t1.a, t1.b, t1.c) = (t2.a, t2.b, t2.c)
>  )
> WHERE
> COALESCE(t2.a, t2.b, t2.c) IS NULL
> {code}
> (register as a temp.view this second query under "*t1_except_t2_df*" name 
> that can be also used to find INTERSECT ALL below):
> 2) *INTERSECT ALL*:
> {code}
> SELECT a,b,c FROM tab1
>  INTERSECT ALL 
> SELECT a,b,c FROM tab2
> {code}
> can be rewritten as following anti-join using t1_except_t2_df we defined 
> above:
> {code}
> SELECT a,b,c
> FROMtab1 t1
> WHERE 
>NOT EXISTS
>(SELECT 1
> FROMt1_except_t2_df e
> WHERE (t1.a, t1.b, t1.c) = (e.a, e.b, e.c)
>)
> {code}
> So the suggestion is just to use above query rewrites to implement both 
> EXCEPT ALL and INTERSECT ALL sql set operations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21274) Implement EXCEPT ALL and INTERSECT ALL

2018-04-05 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21274:

Target Version/s: 2.4.0

> Implement EXCEPT ALL and INTERSECT ALL
> --
>
> Key: SPARK-21274
> URL: https://issues.apache.org/jira/browse/SPARK-21274
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0, 2.2.0
>Reporter: Ruslan Dautkhanov
>Priority: Major
>
> 1) *EXCEPT ALL* / MINUS ALL :
> {code}
> SELECT a,b,c FROM tab1
>  EXCEPT ALL 
> SELECT a,b,c FROM tab2
> {code}
> can be rewritten as following outer join:
> {code}
> SELECT a,b,c
> FROMtab1 t1
>  LEFT OUTER JOIN 
> tab2 t2
>  ON (
> (t1.a, t1.b, t1.c) = (t2.a, t2.b, t2.c)
>  )
> WHERE
> COALESCE(t2.a, t2.b, t2.c) IS NULL
> {code}
> (register as a temp.view this second query under "*t1_except_t2_df*" name 
> that can be also used to find INTERSECT ALL below):
> 2) *INTERSECT ALL*:
> {code}
> SELECT a,b,c FROM tab1
>  INTERSECT ALL 
> SELECT a,b,c FROM tab2
> {code}
> can be rewritten as following anti-join using t1_except_t2_df we defined 
> above:
> {code}
> SELECT a,b,c
> FROMtab1 t1
> WHERE 
>NOT EXISTS
>(SELECT 1
> FROMt1_except_t2_df e
> WHERE (t1.a, t1.b, t1.c) = (e.a, e.b, e.c)
>)
> {code}
> So the suggestion is just to use above query rewrites to implement both 
> EXCEPT ALL and INTERSECT ALL sql set operations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21274) Implement EXCEPT ALL and INTERSECT ALL

2018-04-05 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21274:

Component/s: (was: Optimizer)

> Implement EXCEPT ALL and INTERSECT ALL
> --
>
> Key: SPARK-21274
> URL: https://issues.apache.org/jira/browse/SPARK-21274
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0, 2.2.0
>Reporter: Ruslan Dautkhanov
>Priority: Major
>
> 1) *EXCEPT ALL* / MINUS ALL :
> {code}
> SELECT a,b,c FROM tab1
>  EXCEPT ALL 
> SELECT a,b,c FROM tab2
> {code}
> can be rewritten as following outer join:
> {code}
> SELECT a,b,c
> FROMtab1 t1
>  LEFT OUTER JOIN 
> tab2 t2
>  ON (
> (t1.a, t1.b, t1.c) = (t2.a, t2.b, t2.c)
>  )
> WHERE
> COALESCE(t2.a, t2.b, t2.c) IS NULL
> {code}
> (register as a temp.view this second query under "*t1_except_t2_df*" name 
> that can be also used to find INTERSECT ALL below):
> 2) *INTERSECT ALL*:
> {code}
> SELECT a,b,c FROM tab1
>  INTERSECT ALL 
> SELECT a,b,c FROM tab2
> {code}
> can be rewritten as following anti-join using t1_except_t2_df we defined 
> above:
> {code}
> SELECT a,b,c
> FROMtab1 t1
> WHERE 
>NOT EXISTS
>(SELECT 1
> FROMt1_except_t2_df e
> WHERE (t1.a, t1.b, t1.c) = (e.a, e.b, e.c)
>)
> {code}
> So the suggestion is just to use above query rewrites to implement both 
> EXCEPT ALL and INTERSECT ALL sql set operations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23871) add python api for VectorAssembler handleInvalid

2018-04-05 Thread yogesh garg (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427543#comment-16427543
 ] 

yogesh garg commented on SPARK-23871:
-

I hadn't started working on this yet. Feel free to take it.

> add python api for VectorAssembler handleInvalid
> 
>
> Key: SPARK-23871
> URL: https://issues.apache.org/jira/browse/SPARK-23871
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.3.0
>Reporter: yogesh garg
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23871) add python api for VectorAssembler handleInvalid

2018-04-05 Thread Huaxin Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427460#comment-16427460
 ] 

Huaxin Gao commented on SPARK-23871:


[~yogeshgarg] Are you working on this yourself? If not, may I work on this?

> add python api for VectorAssembler handleInvalid
> 
>
> Key: SPARK-23871
> URL: https://issues.apache.org/jira/browse/SPARK-23871
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.3.0
>Reporter: yogesh garg
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23876) OR condition in joins causes results to come back to driver

2018-04-05 Thread Aniket Arun Kulkarni (JIRA)
Aniket Arun Kulkarni created SPARK-23876:


 Summary: OR condition in joins causes results to come back to 
driver
 Key: SPARK-23876
 URL: https://issues.apache.org/jira/browse/SPARK-23876
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.6.2
Reporter: Aniket Arun Kulkarni


Hello all,

I am trying to implement an 'OR' join, something as shown below,

leftDF.join(rightDF, leftDF(leftJoinKey1) === rightDF(rightJoinKey1) || 
leftDF(leftJoinKey2) === rightDF(rightJoinKey2), joinType)
.select(leftDF.col("*"), rightDF(colToFetchFromRight) as(aliasForRightColumn))

rightDF has around 400 million rows
leftDF has around 200 million rows



When I run this code, I get an error saying

"Total size of serialized results is bigger than spark.driver.maxResultSize"

This means the results are being sent back to the driver even when i did not 
call actions such as count or show.

If I remove the or condition, the results are no longer sent to the driver and 
the job goes through.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23864) Add Unsafe* copy methods to UnsafeWriter

2018-04-05 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell updated SPARK-23864:
--
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-23580

> Add Unsafe* copy methods to UnsafeWriter
> 
>
> Key: SPARK-23864
> URL: https://issues.apache.org/jira/browse/SPARK-23864
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23582) Add interpreted execution to StaticInvoke expression

2018-04-05 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-23582.
---
   Resolution: Fixed
Fix Version/s: 2.4.0

> Add interpreted execution to StaticInvoke expression
> 
>
> Key: SPARK-23582
> URL: https://issues.apache.org/jira/browse/SPARK-23582
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Assignee: Kazuaki Ishizaki
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23593) Add interpreted execution for InitializeJavaBean expression

2018-04-05 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-23593.
---
Resolution: Fixed

> Add interpreted execution for InitializeJavaBean expression
> ---
>
> Key: SPARK-23593
> URL: https://issues.apache.org/jira/browse/SPARK-23593
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16630) Blacklist a node if executors won't launch on it.

2018-04-05 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427377#comment-16427377
 ] 

Thomas Graves commented on SPARK-16630:
---

the problem is that spark.executor.instances (or dynamic allocation) doesn't 
necessarily represent the # of nodes in the cluster, especially if you look at 
dynamic allocation.  Depending on the size of your nodes you can have a lot 
more executors then nodes, thus it could easily end up blacklisting the entire 
cluster.  I would rather look at the actual # of nodes in the cluster.  Is that 
turning out to be hard?

> Blacklist a node if executors won't launch on it.
> -
>
> Key: SPARK-16630
> URL: https://issues.apache.org/jira/browse/SPARK-16630
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.6.2
>Reporter: Thomas Graves
>Priority: Major
>
> On YARN, its possible that a node is messed or misconfigured such that a 
> container won't launch on it.  For instance if the Spark external shuffle 
> handler didn't get loaded on it , maybe its just some other hardware issue or 
> hadoop configuration issue. 
> It would be nice we could recognize this happening and stop trying to launch 
> executors on it since that could end up causing us to hit our max number of 
> executor failures and then kill the job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23865) Not able to load file from Spark Dataframes

2018-04-05 Thread Renjith (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renjith resolved SPARK-23865.
-
Resolution: Fixed

Import package was missing, added.

import org.apache.spark.sql.DataFrame

> Not able to load file from Spark Dataframes
> ---
>
> Key: SPARK-23865
> URL: https://issues.apache.org/jira/browse/SPARK-23865
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 2.3.0
> Environment: Executed in Atom Editor.
>Reporter: Renjith
>Priority: Major
>  Labels: newbie
>
> Hello,
> I am in the phase of learning Spark as part of it trying examples. I am using 
> the following lines of code as below for my file named df.scala:
> import org.apache.spark.sql.SparkSession
>  val spark = SparkSession.builder().getOrCreate()
>  val df = spark.read.csv("CitiGroup2006_2008")
>  df.Head(5)
> In my Scala Terminal:
> scala> :load df.scala
> Loading df.scala...
>  import org.apache.spark.sql.SparkSession
>  spark: org.apache.spark.sql.SparkSession = 
> org.apache.spark.sql.SparkSession@4756e5cc
>  org.apache.spark.sql.AnalysisException: Path does not exist: 
> [file:/C:/Spark/MyPrograms/Scala_and_Spark_Bootcamp_master/SparkD|file:///C:/Spark/MyPrograms/Scala_and_Spark_Bootcamp_master/SparkD]
>  ataFrames/CitiGroup2006_2008;
>  at 
> org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGl
>  obPathIfNecessary(DataSource.scala:715)
>  at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:389)
>  at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:389)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at scala.collection.immutable.List.foreach(List.scala:381)
>  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>  at scala.collection.immutable.List.flatMap(List.scala:344)
> at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:388)
>  at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
>  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
>  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:594)
>  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:473)
>  ... 72 elided
>  :25: error: not found: value df
>  df.Head(5)
>  ^
> all environment variables are set and pointed. Is this a version issue of 
> Spark 2.3.0 or should i degrade the version if so please let me know which 
> version is stable to do my practicals



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-23865) Not able to load file from Spark Dataframes

2018-04-05 Thread Renjith (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renjith closed SPARK-23865.
---

> Not able to load file from Spark Dataframes
> ---
>
> Key: SPARK-23865
> URL: https://issues.apache.org/jira/browse/SPARK-23865
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 2.3.0
> Environment: Executed in Atom Editor.
>Reporter: Renjith
>Priority: Major
>  Labels: newbie
>
> Hello,
> I am in the phase of learning Spark as part of it trying examples. I am using 
> the following lines of code as below for my file named df.scala:
> import org.apache.spark.sql.SparkSession
>  val spark = SparkSession.builder().getOrCreate()
>  val df = spark.read.csv("CitiGroup2006_2008")
>  df.Head(5)
> In my Scala Terminal:
> scala> :load df.scala
> Loading df.scala...
>  import org.apache.spark.sql.SparkSession
>  spark: org.apache.spark.sql.SparkSession = 
> org.apache.spark.sql.SparkSession@4756e5cc
>  org.apache.spark.sql.AnalysisException: Path does not exist: 
> [file:/C:/Spark/MyPrograms/Scala_and_Spark_Bootcamp_master/SparkD|file:///C:/Spark/MyPrograms/Scala_and_Spark_Bootcamp_master/SparkD]
>  ataFrames/CitiGroup2006_2008;
>  at 
> org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGl
>  obPathIfNecessary(DataSource.scala:715)
>  at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:389)
>  at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:389)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at scala.collection.immutable.List.foreach(List.scala:381)
>  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>  at scala.collection.immutable.List.flatMap(List.scala:344)
> at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:388)
>  at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
>  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
>  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:594)
>  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:473)
>  ... 72 elided
>  :25: error: not found: value df
>  df.Head(5)
>  ^
> all environment variables are set and pointed. Is this a version issue of 
> Spark 2.3.0 or should i degrade the version if so please let me know which 
> version is stable to do my practicals



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23874) Upgrade apache/arrow to 0.9.0

2018-04-05 Thread shane knapp (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427265#comment-16427265
 ] 

shane knapp commented on SPARK-23874:
-

yeah, we'll need to update these on the jenkins workers.

will we be doing this for python 2.7, 3.5 or both?

> Upgrade apache/arrow to 0.9.0
> -
>
> Key: SPARK-23874
> URL: https://issues.apache.org/jira/browse/SPARK-23874
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Version 0.9.0 of apache arrow comes with a bug fix related to array 
> serialization. 
> https://issues.apache.org/jira/browse/ARROW-1973



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23264) Support interval values without INTERVAL clauses

2018-04-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23264:


Assignee: Apache Spark  (was: Takeshi Yamamuro)

> Support interval values without INTERVAL clauses
> 
>
> Key: SPARK-23264
> URL: https://issues.apache.org/jira/browse/SPARK-23264
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Takeshi Yamamuro
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 2.3.1, 2.4.0
>
>
> The master currently cannot parse a SQL query below;
> {code:java}
> SELECT cast('2017-08-04' as date) + 1 days;
> {code}
> Since other dbms-like systems support this syntax (e.g., hive and mysql), it 
> might help to support in spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23264) Support interval values without INTERVAL clauses

2018-04-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23264:


Assignee: Takeshi Yamamuro  (was: Apache Spark)

> Support interval values without INTERVAL clauses
> 
>
> Key: SPARK-23264
> URL: https://issues.apache.org/jira/browse/SPARK-23264
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Minor
> Fix For: 2.3.1, 2.4.0
>
>
> The master currently cannot parse a SQL query below;
> {code:java}
> SELECT cast('2017-08-04' as date) + 1 days;
> {code}
> Since other dbms-like systems support this syntax (e.g., hive and mysql), it 
> might help to support in spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23864) Add Unsafe* copy methods to UnsafeWriter

2018-04-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23864:


Assignee: Herman van Hovell  (was: Apache Spark)

> Add Unsafe* copy methods to UnsafeWriter
> 
>
> Key: SPARK-23864
> URL: https://issues.apache.org/jira/browse/SPARK-23864
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23864) Add Unsafe* copy methods to UnsafeWriter

2018-04-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427204#comment-16427204
 ] 

Apache Spark commented on SPARK-23864:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/20986

> Add Unsafe* copy methods to UnsafeWriter
> 
>
> Key: SPARK-23864
> URL: https://issues.apache.org/jira/browse/SPARK-23864
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23864) Add Unsafe* copy methods to UnsafeWriter

2018-04-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23864:


Assignee: Apache Spark  (was: Herman van Hovell)

> Add Unsafe* copy methods to UnsafeWriter
> 
>
> Key: SPARK-23864
> URL: https://issues.apache.org/jira/browse/SPARK-23864
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Herman van Hovell
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23874) Upgrade apache/arrow to 0.9.0

2018-04-05 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427201#comment-16427201
 ] 

Xiao Li commented on SPARK-23874:
-

[~bryanc] You did the update to 0.8.0 last time. 
https://github.com/apache/spark/pull/19884 Could you help this upgrade too?

> Upgrade apache/arrow to 0.9.0
> -
>
> Key: SPARK-23874
> URL: https://issues.apache.org/jira/browse/SPARK-23874
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Version 0.9.0 of apache arrow comes with a bug fix related to array 
> serialization. 
> https://issues.apache.org/jira/browse/ARROW-1973



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16630) Blacklist a node if executors won't launch on it.

2018-04-05 Thread Attila Zsolt Piros (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427181#comment-16427181
 ] 

Attila Zsolt Piros commented on SPARK-16630:


[~tgraves] what about stopping YARN backlisting when a configured limit with 
the default of "spark.executor.instances" * 
"spark.yarn.backlisting.default.executor.instances.size.weight" (a better name 
for the weight is welcomed) limit is reached (including all the blacklisted 
nodes even stage and task level blacklisted nodes) and in case of dynamic 
allocation the default is Int.MaxValue so there is no limit at all?

This idea comes from the calculation of the default for maxNumExecutorFailures.

> Blacklist a node if executors won't launch on it.
> -
>
> Key: SPARK-16630
> URL: https://issues.apache.org/jira/browse/SPARK-16630
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.6.2
>Reporter: Thomas Graves
>Priority: Major
>
> On YARN, its possible that a node is messed or misconfigured such that a 
> container won't launch on it.  For instance if the Spark external shuffle 
> handler didn't get loaded on it , maybe its just some other hardware issue or 
> hadoop configuration issue. 
> It would be nice we could recognize this happening and stop trying to launch 
> executors on it since that could end up causing us to hit our max number of 
> executor failures and then kill the job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16630) Blacklist a node if executors won't launch on it.

2018-04-05 Thread Attila Zsolt Piros (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427123#comment-16427123
 ] 

Attila Zsolt Piros commented on SPARK-16630:


[~irashid] I would reuse spark.blacklist.application.maxFailedExecutorsPerNode 
which has already a default = 2. I think it makes sense to use the same limit 
for per node failures before adding the node to the backlist. But in this case 
I cannot consider spark.yarn.max.executor.failures into the default.

> Blacklist a node if executors won't launch on it.
> -
>
> Key: SPARK-16630
> URL: https://issues.apache.org/jira/browse/SPARK-16630
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.6.2
>Reporter: Thomas Graves
>Priority: Major
>
> On YARN, its possible that a node is messed or misconfigured such that a 
> container won't launch on it.  For instance if the Spark external shuffle 
> handler didn't get loaded on it , maybe its just some other hardware issue or 
> hadoop configuration issue. 
> It would be nice we could recognize this happening and stop trying to launch 
> executors on it since that could end up causing us to hit our max number of 
> executor failures and then kill the job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23465) Dataset.withAllColumnsRenamed should map all column names to a new one

2018-04-05 Thread Mihaly Toth (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihaly Toth resolved SPARK-23465.
-
Resolution: Won't Fix

Based on PR feedback I would conclude that this functionality is not very much 
needed.

> Dataset.withAllColumnsRenamed should map all column names to a new one
> --
>
> Key: SPARK-23465
> URL: https://issues.apache.org/jira/browse/SPARK-23465
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Mihaly Toth
>Priority: Minor
>
> Currently one can only rename a column only one by one using 
> {{withColumnRenamed()}} function. When one would like to rename all or most 
> of the columns it would be easier to specify an algorithm for mapping from 
> the old to the new name (like prefixing) than iterating over all the fields.
> Example usage is joining to a Dataset with the same or similar schema 
> (special case is self joining) where the names are the same or overlapping. 
> Such a joined Dataset would fail at {{saveAsTable()}}
> With the new function usage would be easy like that:
> {code:java}
> ds.withAllColumnsRenamed("prefix" + _)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23593) Add interpreted execution for InitializeJavaBean expression

2018-04-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427057#comment-16427057
 ] 

Apache Spark commented on SPARK-23593:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/20985

> Add interpreted execution for InitializeJavaBean expression
> ---
>
> Key: SPARK-23593
> URL: https://issues.apache.org/jira/browse/SPARK-23593
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23875) Create IndexedSeq wrapper for ArrayData

2018-04-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23875:


Assignee: (was: Apache Spark)

> Create IndexedSeq wrapper for ArrayData
> ---
>
> Key: SPARK-23875
> URL: https://issues.apache.org/jira/browse/SPARK-23875
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Liang-Chi Hsieh
>Priority: Major
>
> We don't have a good way to sequentially access {{UnsafeArrayData}} with a 
> common interface such as Seq. An example is {{MapObject}} where we need to 
> access several sequence collection types together. But {{UnsafeArrayData}} 
> doesn't implement {{ArrayData.array}}. Calling {{toArray}} will copy the 
> entire array. We can provide an {{IndexedSeq}} wrapper for {{ArrayData}}, so 
> we can avoid copying the entire array.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23875) Create IndexedSeq wrapper for ArrayData

2018-04-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427034#comment-16427034
 ] 

Apache Spark commented on SPARK-23875:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/20984

> Create IndexedSeq wrapper for ArrayData
> ---
>
> Key: SPARK-23875
> URL: https://issues.apache.org/jira/browse/SPARK-23875
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Liang-Chi Hsieh
>Priority: Major
>
> We don't have a good way to sequentially access {{UnsafeArrayData}} with a 
> common interface such as Seq. An example is {{MapObject}} where we need to 
> access several sequence collection types together. But {{UnsafeArrayData}} 
> doesn't implement {{ArrayData.array}}. Calling {{toArray}} will copy the 
> entire array. We can provide an {{IndexedSeq}} wrapper for {{ArrayData}}, so 
> we can avoid copying the entire array.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23875) Create IndexedSeq wrapper for ArrayData

2018-04-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23875:


Assignee: Apache Spark

> Create IndexedSeq wrapper for ArrayData
> ---
>
> Key: SPARK-23875
> URL: https://issues.apache.org/jira/browse/SPARK-23875
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>Priority: Major
>
> We don't have a good way to sequentially access {{UnsafeArrayData}} with a 
> common interface such as Seq. An example is {{MapObject}} where we need to 
> access several sequence collection types together. But {{UnsafeArrayData}} 
> doesn't implement {{ArrayData.array}}. Calling {{toArray}} will copy the 
> entire array. We can provide an {{IndexedSeq}} wrapper for {{ArrayData}}, so 
> we can avoid copying the entire array.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23875) Create IndexedSeq wrapper for ArrayData

2018-04-05 Thread Liang-Chi Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-23875:

Description: We don't have a good way to sequentially access 
{{UnsafeArrayData}} with a common interface such as Seq. An example is 
{{MapObject}} where we need to access several sequence collection types 
together. But {{UnsafeArrayData}} doesn't implement {{ArrayData.array}}. 
Calling {{toArray}} will copy the entire array. We can provide an 
{{IndexedSeq}} wrapper for {{ArrayData}}, so we can avoid copying the entire 
array.  (was: We don't have a good way to sequentially access 
{{UnsafeArrayData}} with a common interface such as Seq. An example is 
{{MapObject}} where we need to access several sequence collection types 
together. But {{UnsafeArrayData}} doesn't implement {{ArrayData.array}}. We can 
provide an {{IndexedSeq}} wrapper for {{ArrayData}}, so we can avoid copying 
the entire array.)

> Create IndexedSeq wrapper for ArrayData
> ---
>
> Key: SPARK-23875
> URL: https://issues.apache.org/jira/browse/SPARK-23875
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Liang-Chi Hsieh
>Priority: Major
>
> We don't have a good way to sequentially access {{UnsafeArrayData}} with a 
> common interface such as Seq. An example is {{MapObject}} where we need to 
> access several sequence collection types together. But {{UnsafeArrayData}} 
> doesn't implement {{ArrayData.array}}. Calling {{toArray}} will copy the 
> entire array. We can provide an {{IndexedSeq}} wrapper for {{ArrayData}}, so 
> we can avoid copying the entire array.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2018-04-05 Thread Randy Tidd (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427019#comment-16427019
 ] 

Randy Tidd commented on SPARK-12216:


This issue is not resolved please fix it.  I am having this issue running Spark 
on Windows, with spark-core 2.3.0 and Scala 2.11.  We run Spark jobs on Windows 
laptops via scalatest to perform unit and integration tests and this is an 
important aspect of our development.  I see the error when 
org.apache.spark.util.ShutdownHookManager is trying to delete a temp directory. 
 As Brian noted above, this is a problem because it leaves behind temp files 
that pile up and use up disk space, and clutters the logs making developers 
believe there has been a fatal error.

{{2018-04-05 10:11:16 [pool-7-thread-1] ERROR 
o.a.spark.util.ShutdownHookManager - Exception while deleting Spark temp dir: 
C:\Users\\AppData\Local\Temp\spark-8a2f4434-6533-4dbb-98e4-58565f0044bc}}
{{java.io.IOException: Failed to delete: 
C:\Users\\AppData\Local\Temp\spark-8a2f4434-6533-4dbb-98e4-58565f0044bc}}
{{    at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1070)}}
{{    at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65)}}
{{    at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62)}}
{{    at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)}}
{{    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)}}
{{    at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62)}}
{{    at 
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)}}
{{    at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)}}
{{    at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)}}
{{    at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)}}
{{    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1988)}}
{{    at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)}}
{{    at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)}}
{{    at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)}}
{{    at scala.util.Try$.apply(Try.scala:192)}}
{{    at 
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)}}
{{    at 
org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)}}
{{    at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}}
{{    at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
{{    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)}}
{{    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)}}
{{    at java.lang.Thread.run(Thread.java:745)}}

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
>   

[jira] [Created] (SPARK-23875) Create IndexedSeq wrapper for ArrayData

2018-04-05 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-23875:
---

 Summary: Create IndexedSeq wrapper for ArrayData
 Key: SPARK-23875
 URL: https://issues.apache.org/jira/browse/SPARK-23875
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Liang-Chi Hsieh


We don't have a good way to sequentially access {{UnsafeArrayData}} with a 
common interface such as Seq. An example is {{MapObject}} where we need to 
access several sequence collection types together. But {{UnsafeArrayData}} 
doesn't implement {{ArrayData.array}}. We can provide an {{IndexedSeq}} wrapper 
for {{ArrayData}}, so we can avoid copying the entire array.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23747) Add EpochCoordinator unit tests

2018-04-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427000#comment-16427000
 ] 

Apache Spark commented on SPARK-23747:
--

User 'efimpoberezkin' has created a pull request for this issue:
https://github.com/apache/spark/pull/20983

> Add EpochCoordinator unit tests
> ---
>
> Key: SPARK-23747
> URL: https://issues.apache.org/jira/browse/SPARK-23747
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23747) Add EpochCoordinator unit tests

2018-04-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23747:


Assignee: Apache Spark

> Add EpochCoordinator unit tests
> ---
>
> Key: SPARK-23747
> URL: https://issues.apache.org/jira/browse/SPARK-23747
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23747) Add EpochCoordinator unit tests

2018-04-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23747:


Assignee: (was: Apache Spark)

> Add EpochCoordinator unit tests
> ---
>
> Key: SPARK-23747
> URL: https://issues.apache.org/jira/browse/SPARK-23747
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23593) Add interpreted execution for InitializeJavaBean expression

2018-04-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23593:


Assignee: Apache Spark  (was: Liang-Chi Hsieh)

> Add interpreted execution for InitializeJavaBean expression
> ---
>
> Key: SPARK-23593
> URL: https://issues.apache.org/jira/browse/SPARK-23593
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Assignee: Apache Spark
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23593) Add interpreted execution for InitializeJavaBean expression

2018-04-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23593:


Assignee: Liang-Chi Hsieh  (was: Apache Spark)

> Add interpreted execution for InitializeJavaBean expression
> ---
>
> Key: SPARK-23593
> URL: https://issues.apache.org/jira/browse/SPARK-23593
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23874) Upgrade apache/arrow to 0.9.0

2018-04-05 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426857#comment-16426857
 ] 

Hyukjin Kwon commented on SPARK-23874:
--

If anyone does this, will help review. If it's open for a long while, will try. 
One quick note is it needs Jenkins's update which I believe we should cc 
[~shaneknapp] too.

> Upgrade apache/arrow to 0.9.0
> -
>
> Key: SPARK-23874
> URL: https://issues.apache.org/jira/browse/SPARK-23874
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Version 0.9.0 of apache arrow comes with a bug fix related to array 
> serialization. 
> https://issues.apache.org/jira/browse/ARROW-1973



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-23593) Add interpreted execution for InitializeJavaBean expression

2018-04-05 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reopened SPARK-23593:
---

PR was prematurely merged.

> Add interpreted execution for InitializeJavaBean expression
> ---
>
> Key: SPARK-23593
> URL: https://issues.apache.org/jira/browse/SPARK-23593
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23593) Add interpreted execution for InitializeJavaBean expression

2018-04-05 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reassigned SPARK-23593:
-

Assignee: Liang-Chi Hsieh

> Add interpreted execution for InitializeJavaBean expression
> ---
>
> Key: SPARK-23593
> URL: https://issues.apache.org/jira/browse/SPARK-23593
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23593) Add interpreted execution for InitializeJavaBean expression

2018-04-05 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-23593.
---
   Resolution: Fixed
Fix Version/s: 2.4.0

> Add interpreted execution for InitializeJavaBean expression
> ---
>
> Key: SPARK-23593
> URL: https://issues.apache.org/jira/browse/SPARK-23593
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23859) Initial PR for Instrumentation improvements: UUID and logging levels

2018-04-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23859:


Assignee: Weichen Xu  (was: Apache Spark)

> Initial PR for Instrumentation improvements: UUID and logging levels
> 
>
> Key: SPARK-23859
> URL: https://issues.apache.org/jira/browse/SPARK-23859
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.3.0
>Reporter: Joseph K. Bradley
>Assignee: Weichen Xu
>Priority: Major
>
> This is a subtask for an initial PR to improve MLlib's Instrumentation class 
> for logging.  It will address a couple of issues and use the changes in 
> LogisticRegression as an example class.
> Issues:
> * The UUID is currently generated from an atomic integer.  This is a problem 
> since the integer is reset whenever a persisted Estimator is loaded on a new 
> cluster.  We should just use a random UUID to get a new UUID each time with 
> high probability.
> * We use both Instrumentation and Logging to log stuff.  Let's standardize 
> around Instrumentation in MLlib since it can associate logs with the 
> Estimator or Transformer which produced the logs (via a prefix with the 
> algorithm's name or UUID).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23859) Initial PR for Instrumentation improvements: UUID and logging levels

2018-04-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426727#comment-16426727
 ] 

Apache Spark commented on SPARK-23859:
--

User 'WeichenXu123' has created a pull request for this issue:
https://github.com/apache/spark/pull/20982

> Initial PR for Instrumentation improvements: UUID and logging levels
> 
>
> Key: SPARK-23859
> URL: https://issues.apache.org/jira/browse/SPARK-23859
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.3.0
>Reporter: Joseph K. Bradley
>Assignee: Weichen Xu
>Priority: Major
>
> This is a subtask for an initial PR to improve MLlib's Instrumentation class 
> for logging.  It will address a couple of issues and use the changes in 
> LogisticRegression as an example class.
> Issues:
> * The UUID is currently generated from an atomic integer.  This is a problem 
> since the integer is reset whenever a persisted Estimator is loaded on a new 
> cluster.  We should just use a random UUID to get a new UUID each time with 
> high probability.
> * We use both Instrumentation and Logging to log stuff.  Let's standardize 
> around Instrumentation in MLlib since it can associate logs with the 
> Estimator or Transformer which produced the logs (via a prefix with the 
> algorithm's name or UUID).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23859) Initial PR for Instrumentation improvements: UUID and logging levels

2018-04-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23859:


Assignee: Apache Spark  (was: Weichen Xu)

> Initial PR for Instrumentation improvements: UUID and logging levels
> 
>
> Key: SPARK-23859
> URL: https://issues.apache.org/jira/browse/SPARK-23859
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.3.0
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>Priority: Major
>
> This is a subtask for an initial PR to improve MLlib's Instrumentation class 
> for logging.  It will address a couple of issues and use the changes in 
> LogisticRegression as an example class.
> Issues:
> * The UUID is currently generated from an atomic integer.  This is a problem 
> since the integer is reset whenever a persisted Estimator is loaded on a new 
> cluster.  We should just use a random UUID to get a new UUID each time with 
> high probability.
> * We use both Instrumentation and Logging to log stuff.  Let's standardize 
> around Instrumentation in MLlib since it can associate logs with the 
> Estimator or Transformer which produced the logs (via a prefix with the 
> algorithm's name or UUID).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12823) Cannot create UDF with StructType input

2018-04-05 Thread todesking (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1642#comment-1642
 ] 

todesking commented on SPARK-12823:
---

I understand currently Spark not supported struct/array types in UDF.

But, if so, Spark should reject those UDFs at its definition.

Instead, Spark accepts those UDFs, passes type checks while query planning, and 
die unexpectedly after execution. It is a pitfall.

> Cannot create UDF with StructType input
> ---
>
> Key: SPARK-12823
> URL: https://issues.apache.org/jira/browse/SPARK-12823
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Frank Rosner
>Priority: Major
>
> h5. Problem
> It is not possible to apply a UDF to a column that has a struct data type. 
> Two previous requests to the mailing list remained unanswered.
> h5. How-To-Reproduce
> {code}
> val sql = new org.apache.spark.sql.SQLContext(sc)
> import sql.implicits._
> case class KV(key: Long, value: String)
> case class Row(kv: KV)
> val df = sc.parallelize(List(Row(KV(1L, "a")), Row(KV(5L, "b".toDF
> val udf1 = org.apache.spark.sql.functions.udf((kv: KV) => kv.value)
> df.select(udf1(df("kv"))).show
> // java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast 
> to $line78.$read$$iwC$$iwC$KV
> val udf2 = org.apache.spark.sql.functions.udf((kv: (Long, String)) => kv._2)
> df.select(udf2(df("kv"))).show
> // org.apache.spark.sql.AnalysisException: cannot resolve 'UDF(kv)' due to 
> data type mismatch: argument 1 requires struct<_1:bigint,_2:string> type, 
> however, 'kv' is of struct type.;
> {code}
> h5. Mailing List Entries
> - 
> https://mail-archives.apache.org/mod_mbox/spark-user/201511.mbox/%3CCACUahd8M=ipCbFCYDyein_=vqyoantn-tpxe6sq395nh10g...@mail.gmail.com%3E
> - https://www.mail-archive.com/user@spark.apache.org/msg43092.html
> h5. Possible Workaround
> If you create a {{UserDefinedFunction}} manually, not using the {{udf}} 
> helper functions, it works. See https://github.com/FRosner/struct-udf, which 
> exposes the {{UserDefinedFunction}} constructor (public from package 
> private). However, then you have to work with a {{Row}}, because it does not 
> automatically convert the row to a case class / tuple.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23872) Can not connect to another metastore uri using two Spark sessions

2018-04-05 Thread Park Chan Min (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Park Chan Min updated SPARK-23872:
--
Description: 
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

 ** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

  ## Run info result in spark 2.1.0 ###
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..
 ---> Run info result in spark 2.3.0 <
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**

  was:
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

 ** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

  --## Run info result in spark 2.1.0 ###--
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..
 ---> Run info result in spark 2.3.0 <
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**


> Can not connect to another metastore uri using two Spark sessions
> -
>
> Key: SPARK-23872
> URL: https://issues.apache.org/jira/browse/SPARK-23872
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: OS  :CentOS release 6.8 (Final)
> JAVA : build 1.8.0_101-b13
> SPARK : 2.3.0
>  
>Reporter: Park Chan Min
>Priority: Major
>
> In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
> metasore information is used when the second session is run
>  ** Run Source Code **
>  val spark_1 = SparkSession.builder()
>  .enableHiveSupport()
>  

[jira] [Updated] (SPARK-23872) Can not connect to another metastore uri using two Spark sessions

2018-04-05 Thread Park Chan Min (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Park Chan Min updated SPARK-23872:
--
Description: 
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

 ** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

  *## Run info result in spark 2.1.0 ###
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..
 ---> Run info result in spark 2.3.0 <
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**

  was:
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

 ** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

  ## Run info result in spark 2.1.0 ###
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..
 ---> Run info result in spark 2.3.0 <
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**


> Can not connect to another metastore uri using two Spark sessions
> -
>
> Key: SPARK-23872
> URL: https://issues.apache.org/jira/browse/SPARK-23872
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: OS  :CentOS release 6.8 (Final)
> JAVA : build 1.8.0_101-b13
> SPARK : 2.3.0
>  
>Reporter: Park Chan Min
>Priority: Major
>
> In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
> metasore information is used when the second session is run
>  ** Run Source Code **
>  val spark_1 = SparkSession.builder()
>  .enableHiveSupport()
>  

[jira] [Updated] (SPARK-23872) Can not connect to another metastore uri using two Spark sessions

2018-04-05 Thread Park Chan Min (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Park Chan Min updated SPARK-23872:
--
Description: 
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

 ** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

  --## Run info result in spark 2.1.0 ###--
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..
 ---> Run info result in spark 2.3.0 <
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**

  was:
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

 ** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

  ---> Run info result in spark 2.1.0 <
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..
 ---> Run info result in spark 2.3.0 <
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**


> Can not connect to another metastore uri using two Spark sessions
> -
>
> Key: SPARK-23872
> URL: https://issues.apache.org/jira/browse/SPARK-23872
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: OS  :CentOS release 6.8 (Final)
> JAVA : build 1.8.0_101-b13
> SPARK : 2.3.0
>  
>Reporter: Park Chan Min
>Priority: Major
>
> In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
> metasore information is used when the second session is run
>  ** Run Source Code **
>  val spark_1 = SparkSession.builder()
>  .enableHiveSupport()
>  .config("hive.metastore.uris", 

[jira] [Updated] (SPARK-23872) Can not connect to another metastore uri using two Spark sessions

2018-04-05 Thread Park Chan Min (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Park Chan Min updated SPARK-23872:
--
Description: 
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

  $## Run Source Code  $##
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

  $## Run info result in spark 2.1.0  $##
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..
  $## Run info result in spark 2.3.0  $##
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**

  was:
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

 ** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

  $## Run info result in spark 2.1.0  $##
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..
  $## Run info result in spark 2.3.0  $##
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**


> Can not connect to another metastore uri using two Spark sessions
> -
>
> Key: SPARK-23872
> URL: https://issues.apache.org/jira/browse/SPARK-23872
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: OS  :CentOS release 6.8 (Final)
> JAVA : build 1.8.0_101-b13
> SPARK : 2.3.0
>  
>Reporter: Park Chan Min
>Priority: Major
>
> In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
> metasore information is used when the second session is run
>   $## Run Source Code  $##
>  val spark_1 = SparkSession.builder()
>  .enableHiveSupport()
>  

[jira] [Updated] (SPARK-23872) Can not connect to another metastore uri using two Spark sessions

2018-04-05 Thread Park Chan Min (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Park Chan Min updated SPARK-23872:
--
Description: 
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

 ** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

  $## Run info result in spark 2.1.0  $##
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..
  $## Run info result in spark 2.3.0  $##
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**

  was:
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

 ** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

  $## Run info result in spark 2.1.0 ###
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..
 ---> Run info result in spark 2.3.0 <
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**


> Can not connect to another metastore uri using two Spark sessions
> -
>
> Key: SPARK-23872
> URL: https://issues.apache.org/jira/browse/SPARK-23872
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: OS  :CentOS release 6.8 (Final)
> JAVA : build 1.8.0_101-b13
> SPARK : 2.3.0
>  
>Reporter: Park Chan Min
>Priority: Major
>
> In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
> metasore information is used when the second session is run
>  ** Run Source Code **
>  val spark_1 = SparkSession.builder()
>  .enableHiveSupport()
>  

[jira] [Updated] (SPARK-23872) Can not connect to another metastore uri using two Spark sessions

2018-04-05 Thread Park Chan Min (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Park Chan Min updated SPARK-23872:
--
Description: 
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

 ** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

  $## Run info result in spark 2.1.0 ###
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..
 ---> Run info result in spark 2.3.0 <
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**

  was:
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

 ** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

  *## Run info result in spark 2.1.0 ###
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..
 ---> Run info result in spark 2.3.0 <
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**


> Can not connect to another metastore uri using two Spark sessions
> -
>
> Key: SPARK-23872
> URL: https://issues.apache.org/jira/browse/SPARK-23872
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: OS  :CentOS release 6.8 (Final)
> JAVA : build 1.8.0_101-b13
> SPARK : 2.3.0
>  
>Reporter: Park Chan Min
>Priority: Major
>
> In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
> metasore information is used when the second session is run
>  ** Run Source Code **
>  val spark_1 = SparkSession.builder()
>  .enableHiveSupport()
>  

[jira] [Updated] (SPARK-23872) Can not connect to another metastore uri using two Spark sessions

2018-04-05 Thread Park Chan Min (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Park Chan Min updated SPARK-23872:
--
Description: 
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

 ** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

  ---> Run info result in spark 2.1.0 <
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..
 ---> Run info result in spark 2.3.0 <
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**

  was:
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

 ** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

  ---> Run info result in spark 2.1.0 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..
 ---> Run info result in spark 2.3.0
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**


> Can not connect to another metastore uri using two Spark sessions
> -
>
> Key: SPARK-23872
> URL: https://issues.apache.org/jira/browse/SPARK-23872
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: OS  :CentOS release 6.8 (Final)
> JAVA : build 1.8.0_101-b13
> SPARK : 2.3.0
>  
>Reporter: Park Chan Min
>Priority: Major
>
> In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
> metasore information is used when the second session is run
>  ** Run Source Code **
>  val spark_1 = SparkSession.builder()
>  .enableHiveSupport()
>  .config("hive.metastore.uris", "thrift://HOST_A:9083")
>  .getOrCreate()
> 

[jira] [Updated] (SPARK-23872) Can not connect to another metastore uri using two Spark sessions

2018-04-05 Thread Park Chan Min (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Park Chan Min updated SPARK-23872:
--
Description: 
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

 ** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

  ---> Run info result in spark 2.1.0 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..
 ---> Run info result in spark 2.3.0
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**

  was:
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

 ** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

  ---> Run info result in spark 2.1.0 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..

 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**


> Can not connect to another metastore uri using two Spark sessions
> -
>
> Key: SPARK-23872
> URL: https://issues.apache.org/jira/browse/SPARK-23872
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: OS  :CentOS release 6.8 (Final)
> JAVA : build 1.8.0_101-b13
> SPARK : 2.3.0
>  
>Reporter: Park Chan Min
>Priority: Major
>
> In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
> metasore information is used when the second session is run
>  ** Run Source Code **
>  val spark_1 = SparkSession.builder()
>  .enableHiveSupport()
>  .config("hive.metastore.uris", "thrift://HOST_A:9083")
>  .getOrCreate()
> spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()
> 

[jira] [Updated] (SPARK-23872) Can not connect to another metastore uri using two Spark sessions

2018-04-05 Thread Park Chan Min (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Park Chan Min updated SPARK-23872:
--
Description: 
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

 ** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

  ### Run info result in spark 2.1.0 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..

 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**

  was:
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

 Run info result in spark 2.1.0 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..

 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**


> Can not connect to another metastore uri using two Spark sessions
> -
>
> Key: SPARK-23872
> URL: https://issues.apache.org/jira/browse/SPARK-23872
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: OS  :CentOS release 6.8 (Final)
> JAVA : build 1.8.0_101-b13
> SPARK : 2.3.0
>  
>Reporter: Park Chan Min
>Priority: Major
>
> In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
> metasore information is used when the second session is run
>  ** Run Source Code **
>  val spark_1 = SparkSession.builder()
>  .enableHiveSupport()
>  .config("hive.metastore.uris", "thrift://HOST_A:9083")
>  .getOrCreate()
> spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()
> SparkSession.clearActiveSession()
>  

[jira] [Updated] (SPARK-23872) Can not connect to another metastore uri using two Spark sessions

2018-04-05 Thread Park Chan Min (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Park Chan Min updated SPARK-23872:
--
Description: 
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

 ** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

  ---> Run info result in spark 2.1.0 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..

 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**

  was:
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

 ** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

  ### Run info result in spark 2.1.0 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..

 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**


> Can not connect to another metastore uri using two Spark sessions
> -
>
> Key: SPARK-23872
> URL: https://issues.apache.org/jira/browse/SPARK-23872
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: OS  :CentOS release 6.8 (Final)
> JAVA : build 1.8.0_101-b13
> SPARK : 2.3.0
>  
>Reporter: Park Chan Min
>Priority: Major
>
> In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
> metasore information is used when the second session is run
>  ** Run Source Code **
>  val spark_1 = SparkSession.builder()
>  .enableHiveSupport()
>  .config("hive.metastore.uris", "thrift://HOST_A:9083")
>  .getOrCreate()
> spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()
> SparkSession.clearActiveSession()
>  

[jira] [Updated] (SPARK-23872) Can not connect to another metastore uri using two Spark sessions

2018-04-05 Thread Park Chan Min (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Park Chan Min updated SPARK-23872:
--
Description: 
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

 Run info result in spark 2.1.0 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..

** Run info result in spark 2.3.0 
**
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**

  was:
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

### Run info result in spark 2.1.0 ###
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..

** Run info result in spark 2.3.0 
**
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**


> Can not connect to another metastore uri using two Spark sessions
> -
>
> Key: SPARK-23872
> URL: https://issues.apache.org/jira/browse/SPARK-23872
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: OS  :CentOS release 6.8 (Final)
> JAVA : build 1.8.0_101-b13
> SPARK : 2.3.0
>  
>Reporter: Park Chan Min
>Priority: Major
>
> In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
> metasore information is used when the second session is run
> ** Run Source Code **
>  val spark_1 = SparkSession.builder()
>  .enableHiveSupport()
> 

[jira] [Updated] (SPARK-23872) Can not connect to another metastore uri using two Spark sessions

2018-04-05 Thread Park Chan Min (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Park Chan Min updated SPARK-23872:
--
Description: 
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

### Run info result in spark 2.1.0 ###
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..

** Run info result in spark 2.3.0 
**
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**

  was:
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

** Run info result in spark 2.1.0 
**
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..

** Run info result in spark 2.3.0 
**
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**


> Can not connect to another metastore uri using two Spark sessions
> -
>
> Key: SPARK-23872
> URL: https://issues.apache.org/jira/browse/SPARK-23872
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: OS  :CentOS release 6.8 (Final)
> JAVA : build 1.8.0_101-b13
> SPARK : 2.3.0
>  
>Reporter: Park Chan Min
>Priority: Major
>
> In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
> metasore information is used when the second session is run
> ** Run Source Code **
>  val 

[jira] [Updated] (SPARK-23872) Can not connect to another metastore uri using two Spark sessions

2018-04-05 Thread Park Chan Min (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Park Chan Min updated SPARK-23872:
--
Description: 
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

 Run info result in spark 2.1.0 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..

 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**

  was:
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

 Run info result in spark 2.1.0 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..

** Run info result in spark 2.3.0 
**
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**


> Can not connect to another metastore uri using two Spark sessions
> -
>
> Key: SPARK-23872
> URL: https://issues.apache.org/jira/browse/SPARK-23872
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: OS  :CentOS release 6.8 (Final)
> JAVA : build 1.8.0_101-b13
> SPARK : 2.3.0
>  
>Reporter: Park Chan Min
>Priority: Major
>
> In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
> metasore information is used when the second session is run
> ** Run Source Code **
>  val spark_1 = SparkSession.builder()
>  .enableHiveSupport()
>  .config("hive.metastore.uris", "thrift://HOST_A:9083")
>  .getOrCreate()
> spark_1.sql("SELECT A_FIELD FROM 

[jira] [Updated] (SPARK-23872) Can not connect to another metastore uri using two Spark sessions

2018-04-05 Thread Park Chan Min (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Park Chan Min updated SPARK-23872:
--
Description: 
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()

** Run info result in spark 2.1.0 
**
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..

** Run info result in spark 2.3.0 
**
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**

  was:
In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
metasore information is used when the second session is run

** Run Source Code **
 val spark_1 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_A:9083")
 .getOrCreate()

spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()

SparkSession.clearActiveSession()
 SparkSession.clearDefaultSession()

val spark_2 = SparkSession.builder()
 .enableHiveSupport()
 .config("hive.metastore.uris", "thrift://HOST_B:9083")
 .getOrCreate()

spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_B{color}*:9083
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / B_FIELD /
 ---
 /       B       /
 
 ..
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
0.807905 s
 + +
 / A_FIELD /
 ---
 /       A       /
 
 ..
 INFO metastore: Trying to connect to metastore with URI 
thrift://*{color:#d04437}HOST_A{color}*:9083
 ..
 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
view not found: `default`.`TABLE_B`; line 1 pos 19;

**


> Can not connect to another metastore uri using two Spark sessions
> -
>
> Key: SPARK-23872
> URL: https://issues.apache.org/jira/browse/SPARK-23872
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: OS  :CentOS release 6.8 (Final)
> JAVA : build 1.8.0_101-b13
> SPARK : 2.3.0
>  
>Reporter: Park Chan Min
>Priority: Major
>
> In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
> metasore information is used when the second session is run
> ** Run Source Code **
>  val spark_1 = SparkSession.builder()
>  .enableHiveSupport()
>  .config("hive.metastore.uris", "thrift://HOST_A:9083")
>  .getOrCreate()
> spark_1.sql("SELECT A_FIELD 

  1   2   >