Re: Missing HiveConf when starting PySpark from head
Yes, my bad. The code in session.py needs to also catch TypeError like before. On Thu, Jun 14, 2018 at 11:03 AM, Li Jin wrote: > Sounds good. Thanks all for the quick reply. > > https://issues.apache.org/jira/browse/SPARK-24563 > > > On Thu, Jun 14, 2018 at 12:19 PM, Xiao Li wrote: >> >> Thanks for catching this. Please feel free to submit a PR. I do not think >> Vanzin wants to introduce the behavior changes in that PR. We should do the >> code review more carefully. >> >> Xiao >> >> 2018-06-14 9:18 GMT-07:00 Li Jin : >>> >>> Are there objection to restore the behavior for PySpark users? I am happy >>> to submit a patch. >>> >>> On Thu, Jun 14, 2018 at 12:15 PM Reynold Xin wrote: The behavior change is not good... On Thu, Jun 14, 2018 at 9:05 AM Li Jin wrote: > > Ah, looks like it's this change: > > https://github.com/apache/spark/commit/b3417b731d4e323398a0d7ec6e86405f4464f4f9#diff-3b5463566251d5b09fd328738a9e9bc5 > > It seems strange that by default Spark doesn't build with Hive but by > default PySpark requires it... > > This might also be a behavior change to PySpark users that build Spark > without Hive. The old behavior is "fall back to non-hive support" and the > new behavior is "program won't start". > > On Thu, Jun 14, 2018 at 11:51 AM, Sean Owen wrote: >> >> I think you would have to build with the 'hive' profile? but if so >> that would have been true for a while now. >> >> >> On Thu, Jun 14, 2018 at 10:38 AM Li Jin wrote: >>> >>> Hey all, >>> >>> I just did a clean checkout of github.com/apache/spark but failed to >>> start PySpark, this is what I did: >>> >>> git clone g...@github.com:apache/spark.git; cd spark; build/sbt >>> package; bin/pyspark >>> >>> >>> And got this exception: >>> >>> (spark-dev) Lis-MacBook-Pro:spark icexelloss$ bin/pyspark >>> >>> Python 3.6.3 |Anaconda, Inc.| (default, Nov 8 2017, 18:10:31) >>> >>> [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin >>> >>> Type "help", "copyright", "credits" or "license" for more >>> information. >>> >>> 18/06/14 11:34:14 WARN NativeCodeLoader: Unable to load native-hadoop >>> library for your platform... using builtin-java classes where applicable >>> >>> Using Spark's default log4j profile: >>> org/apache/spark/log4j-defaults.properties >>> >>> Setting default log level to "WARN". >>> >>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use >>> setLogLevel(newLevel). >>> >>> >>> /Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py:45: >>> UserWarning: Failed to initialize Spark session. >>> >>> warnings.warn("Failed to initialize Spark session.") >>> >>> Traceback (most recent call last): >>> >>> File >>> "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py", >>> line >>> 41, in >>> >>> spark = SparkSession._create_shell_session() >>> >>> File >>> "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/sql/session.py", >>> line 564, in _create_shell_session >>> >>> SparkContext._jvm.org.apache.hadoop.hive.conf.HiveConf() >>> >>> TypeError: 'JavaPackage' object is not callable >>> >>> >>> I also tried to delete hadoop deps from my ivy2 cache and reinstall >>> them but no luck. I wonder: >>> >>> I have not seen this before, could this be caused by recent change to >>> head? >>> Am I doing something wrong in the build process? >>> >>> >>> Thanks much! >>> Li >>> > >> > -- Marcelo - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Missing HiveConf when starting PySpark from head
Sounds good. Thanks all for the quick reply. https://issues.apache.org/jira/browse/SPARK-24563 On Thu, Jun 14, 2018 at 12:19 PM, Xiao Li wrote: > Thanks for catching this. Please feel free to submit a PR. I do not think > Vanzin wants to introduce the behavior changes in that PR. We should do the > code review more carefully. > > Xiao > > 2018-06-14 9:18 GMT-07:00 Li Jin : > >> Are there objection to restore the behavior for PySpark users? I am happy >> to submit a patch. >> >> On Thu, Jun 14, 2018 at 12:15 PM Reynold Xin wrote: >> >>> The behavior change is not good... >>> >>> On Thu, Jun 14, 2018 at 9:05 AM Li Jin wrote: >>> Ah, looks like it's this change: https://github.com/apache/spark/commit/b3417b731d4e323398a0d 7ec6e86405f4464f4f9#diff-3b5463566251d5b09fd328738a9e9bc5 It seems strange that by default Spark doesn't build with Hive but by default PySpark requires it... This might also be a behavior change to PySpark users that build Spark without Hive. The old behavior is "fall back to non-hive support" and the new behavior is "program won't start". On Thu, Jun 14, 2018 at 11:51 AM, Sean Owen wrote: > I think you would have to build with the 'hive' profile? but if so > that would have been true for a while now. > > > On Thu, Jun 14, 2018 at 10:38 AM Li Jin wrote: > >> Hey all, >> >> I just did a clean checkout of github.com/apache/spark but failed to >> start PySpark, this is what I did: >> >> git clone g...@github.com:apache/spark.git; cd spark; build/sbt >> package; bin/pyspark >> >> And got this exception: >> >> (spark-dev) Lis-MacBook-Pro:spark icexelloss$ bin/pyspark >> >> Python 3.6.3 |Anaconda, Inc.| (default, Nov 8 2017, 18:10:31) >> >> [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin >> >> Type "help", "copyright", "credits" or "license" for more information. >> >> 18/06/14 11:34:14 WARN NativeCodeLoader: Unable to load native-hadoop >> library for your platform... using builtin-java classes where applicable >> >> Using Spark's default log4j profile: org/apache/spark/log4j-default >> s.properties >> >> Setting default log level to "WARN". >> >> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use >> setLogLevel(newLevel). >> >> /Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py:45: >> UserWarning: Failed to initialize Spark session. >> >> warnings.warn("Failed to initialize Spark session.") >> >> Traceback (most recent call last): >> >> File >> "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py", >> line 41, in >> >> spark = SparkSession._create_shell_session() >> >> File >> "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/sql/session.py", >> line 564, in _create_shell_session >> >> SparkContext._jvm.org.apache.hadoop.hive.conf.HiveConf() >> >> TypeError: 'JavaPackage' object is not callable >> >> I also tried to delete hadoop deps from my ivy2 cache and reinstall >> them but no luck. I wonder: >> >> >>1. I have not seen this before, could this be caused by recent >>change to head? >>2. Am I doing something wrong in the build process? >> >> >> Thanks much! >> Li >> >> >
Re: Missing HiveConf when starting PySpark from head
Thanks for catching this. Please feel free to submit a PR. I do not think Vanzin wants to introduce the behavior changes in that PR. We should do the code review more carefully. Xiao 2018-06-14 9:18 GMT-07:00 Li Jin : > Are there objection to restore the behavior for PySpark users? I am happy > to submit a patch. > > On Thu, Jun 14, 2018 at 12:15 PM Reynold Xin wrote: > >> The behavior change is not good... >> >> On Thu, Jun 14, 2018 at 9:05 AM Li Jin wrote: >> >>> Ah, looks like it's this change: >>> https://github.com/apache/spark/commit/b3417b731d4e323398a0d7ec6e8640 >>> 5f4464f4f9#diff-3b5463566251d5b09fd328738a9e9bc5 >>> >>> It seems strange that by default Spark doesn't build with Hive but by >>> default PySpark requires it... >>> >>> This might also be a behavior change to PySpark users that build Spark >>> without Hive. The old behavior is "fall back to non-hive support" and the >>> new behavior is "program won't start". >>> >>> On Thu, Jun 14, 2018 at 11:51 AM, Sean Owen wrote: >>> I think you would have to build with the 'hive' profile? but if so that would have been true for a while now. On Thu, Jun 14, 2018 at 10:38 AM Li Jin wrote: > Hey all, > > I just did a clean checkout of github.com/apache/spark but failed to > start PySpark, this is what I did: > > git clone g...@github.com:apache/spark.git; cd spark; build/sbt > package; bin/pyspark > > And got this exception: > > (spark-dev) Lis-MacBook-Pro:spark icexelloss$ bin/pyspark > > Python 3.6.3 |Anaconda, Inc.| (default, Nov 8 2017, 18:10:31) > > [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin > > Type "help", "copyright", "credits" or "license" for more information. > > 18/06/14 11:34:14 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > > Using Spark's default log4j profile: org/apache/spark/log4j- > defaults.properties > > Setting default log level to "WARN". > > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > > /Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py:45: > UserWarning: Failed to initialize Spark session. > > warnings.warn("Failed to initialize Spark session.") > > Traceback (most recent call last): > > File > "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py", > line 41, in > > spark = SparkSession._create_shell_session() > > File > "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/sql/session.py", > line 564, in _create_shell_session > > SparkContext._jvm.org.apache.hadoop.hive.conf.HiveConf() > > TypeError: 'JavaPackage' object is not callable > > I also tried to delete hadoop deps from my ivy2 cache and reinstall > them but no luck. I wonder: > > >1. I have not seen this before, could this be caused by recent >change to head? >2. Am I doing something wrong in the build process? > > > Thanks much! > Li > > >>>
Re: Missing HiveConf when starting PySpark from head
Are there objection to restore the behavior for PySpark users? I am happy to submit a patch. On Thu, Jun 14, 2018 at 12:15 PM Reynold Xin wrote: > The behavior change is not good... > > On Thu, Jun 14, 2018 at 9:05 AM Li Jin wrote: > >> Ah, looks like it's this change: >> >> https://github.com/apache/spark/commit/b3417b731d4e323398a0d7ec6e86405f4464f4f9#diff-3b5463566251d5b09fd328738a9e9bc5 >> >> It seems strange that by default Spark doesn't build with Hive but by >> default PySpark requires it... >> >> This might also be a behavior change to PySpark users that build Spark >> without Hive. The old behavior is "fall back to non-hive support" and the >> new behavior is "program won't start". >> >> On Thu, Jun 14, 2018 at 11:51 AM, Sean Owen wrote: >> >>> I think you would have to build with the 'hive' profile? but if so that >>> would have been true for a while now. >>> >>> >>> On Thu, Jun 14, 2018 at 10:38 AM Li Jin wrote: >>> Hey all, I just did a clean checkout of github.com/apache/spark but failed to start PySpark, this is what I did: git clone g...@github.com:apache/spark.git; cd spark; build/sbt package; bin/pyspark And got this exception: (spark-dev) Lis-MacBook-Pro:spark icexelloss$ bin/pyspark Python 3.6.3 |Anaconda, Inc.| (default, Nov 8 2017, 18:10:31) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin Type "help", "copyright", "credits" or "license" for more information. 18/06/14 11:34:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). /Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py:45: UserWarning: Failed to initialize Spark session. warnings.warn("Failed to initialize Spark session.") Traceback (most recent call last): File "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py", line 41, in spark = SparkSession._create_shell_session() File "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/sql/session.py", line 564, in _create_shell_session SparkContext._jvm.org.apache.hadoop.hive.conf.HiveConf() TypeError: 'JavaPackage' object is not callable I also tried to delete hadoop deps from my ivy2 cache and reinstall them but no luck. I wonder: 1. I have not seen this before, could this be caused by recent change to head? 2. Am I doing something wrong in the build process? Thanks much! Li >>
Re: Missing HiveConf when starting PySpark from head
The behavior change is not good... On Thu, Jun 14, 2018 at 9:05 AM Li Jin wrote: > Ah, looks like it's this change: > > https://github.com/apache/spark/commit/b3417b731d4e323398a0d7ec6e86405f4464f4f9#diff-3b5463566251d5b09fd328738a9e9bc5 > > It seems strange that by default Spark doesn't build with Hive but by > default PySpark requires it... > > This might also be a behavior change to PySpark users that build Spark > without Hive. The old behavior is "fall back to non-hive support" and the > new behavior is "program won't start". > > On Thu, Jun 14, 2018 at 11:51 AM, Sean Owen wrote: > >> I think you would have to build with the 'hive' profile? but if so that >> would have been true for a while now. >> >> >> On Thu, Jun 14, 2018 at 10:38 AM Li Jin wrote: >> >>> Hey all, >>> >>> I just did a clean checkout of github.com/apache/spark but failed to >>> start PySpark, this is what I did: >>> >>> git clone g...@github.com:apache/spark.git; cd spark; build/sbt package; >>> bin/pyspark >>> >>> And got this exception: >>> >>> (spark-dev) Lis-MacBook-Pro:spark icexelloss$ bin/pyspark >>> >>> Python 3.6.3 |Anaconda, Inc.| (default, Nov 8 2017, 18:10:31) >>> >>> [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin >>> >>> Type "help", "copyright", "credits" or "license" for more information. >>> >>> 18/06/14 11:34:14 WARN NativeCodeLoader: Unable to load native-hadoop >>> library for your platform... using builtin-java classes where applicable >>> >>> Using Spark's default log4j profile: >>> org/apache/spark/log4j-defaults.properties >>> >>> Setting default log level to "WARN". >>> >>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use >>> setLogLevel(newLevel). >>> >>> /Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py:45: >>> UserWarning: Failed to initialize Spark session. >>> >>> warnings.warn("Failed to initialize Spark session.") >>> >>> Traceback (most recent call last): >>> >>> File >>> "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py", line >>> 41, in >>> >>> spark = SparkSession._create_shell_session() >>> >>> File >>> "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/sql/session.py", >>> line 564, in _create_shell_session >>> >>> SparkContext._jvm.org.apache.hadoop.hive.conf.HiveConf() >>> >>> TypeError: 'JavaPackage' object is not callable >>> >>> I also tried to delete hadoop deps from my ivy2 cache and reinstall them >>> but no luck. I wonder: >>> >>> >>>1. I have not seen this before, could this be caused by recent >>>change to head? >>>2. Am I doing something wrong in the build process? >>> >>> >>> Thanks much! >>> Li >>> >>> >
Re: Missing HiveConf when starting PySpark from head
Ah, looks like it's this change: https://github.com/apache/spark/commit/b3417b731d4e323398a0d7ec6e86405f4464f4f9#diff-3b5463566251d5b09fd328738a9e9bc5 It seems strange that by default Spark doesn't build with Hive but by default PySpark requires it... This might also be a behavior change to PySpark users that build Spark without Hive. The old behavior is "fall back to non-hive support" and the new behavior is "program won't start". On Thu, Jun 14, 2018 at 11:51 AM, Sean Owen wrote: > I think you would have to build with the 'hive' profile? but if so that > would have been true for a while now. > > > On Thu, Jun 14, 2018 at 10:38 AM Li Jin wrote: > >> Hey all, >> >> I just did a clean checkout of github.com/apache/spark but failed to >> start PySpark, this is what I did: >> >> git clone g...@github.com:apache/spark.git; cd spark; build/sbt package; >> bin/pyspark >> >> And got this exception: >> >> (spark-dev) Lis-MacBook-Pro:spark icexelloss$ bin/pyspark >> >> Python 3.6.3 |Anaconda, Inc.| (default, Nov 8 2017, 18:10:31) >> >> [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin >> >> Type "help", "copyright", "credits" or "license" for more information. >> >> 18/06/14 11:34:14 WARN NativeCodeLoader: Unable to load native-hadoop >> library for your platform... using builtin-java classes where applicable >> >> Using Spark's default log4j profile: org/apache/spark/log4j- >> defaults.properties >> >> Setting default log level to "WARN". >> >> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use >> setLogLevel(newLevel). >> >> /Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py:45: >> UserWarning: Failed to initialize Spark session. >> >> warnings.warn("Failed to initialize Spark session.") >> >> Traceback (most recent call last): >> >> File "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py", >> line 41, in >> >> spark = SparkSession._create_shell_session() >> >> File >> "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/sql/session.py", >> line 564, in _create_shell_session >> >> SparkContext._jvm.org.apache.hadoop.hive.conf.HiveConf() >> >> TypeError: 'JavaPackage' object is not callable >> >> I also tried to delete hadoop deps from my ivy2 cache and reinstall them >> but no luck. I wonder: >> >> >>1. I have not seen this before, could this be caused by recent change >>to head? >>2. Am I doing something wrong in the build process? >> >> >> Thanks much! >> Li >> >>
Re: Missing HiveConf when starting PySpark from head
I think you would have to build with the 'hive' profile? but if so that would have been true for a while now. On Thu, Jun 14, 2018 at 10:38 AM Li Jin wrote: > Hey all, > > I just did a clean checkout of github.com/apache/spark but failed to > start PySpark, this is what I did: > > git clone g...@github.com:apache/spark.git; cd spark; build/sbt package; > bin/pyspark > > And got this exception: > > (spark-dev) Lis-MacBook-Pro:spark icexelloss$ bin/pyspark > > Python 3.6.3 |Anaconda, Inc.| (default, Nov 8 2017, 18:10:31) > > [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin > > Type "help", "copyright", "credits" or "license" for more information. > > 18/06/14 11:34:14 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > > Setting default log level to "WARN". > > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > > /Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py:45: > UserWarning: Failed to initialize Spark session. > > warnings.warn("Failed to initialize Spark session.") > > Traceback (most recent call last): > > File > "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py", line > 41, in > > spark = SparkSession._create_shell_session() > > File > "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/sql/session.py", > line 564, in _create_shell_session > > SparkContext._jvm.org.apache.hadoop.hive.conf.HiveConf() > > TypeError: 'JavaPackage' object is not callable > > I also tried to delete hadoop deps from my ivy2 cache and reinstall them > but no luck. I wonder: > > >1. I have not seen this before, could this be caused by recent change >to head? >2. Am I doing something wrong in the build process? > > > Thanks much! > Li > >
Re: Missing HiveConf when starting PySpark from head
I can work around by using: bin/pyspark --conf spark.sql.catalogImplementation=in-memory now, but still wonder what's going on with HiveConf.. On Thu, Jun 14, 2018 at 11:37 AM, Li Jin wrote: > Hey all, > > I just did a clean checkout of github.com/apache/spark but failed to > start PySpark, this is what I did: > > git clone g...@github.com:apache/spark.git; cd spark; build/sbt package; > bin/pyspark > > And got this exception: > > (spark-dev) Lis-MacBook-Pro:spark icexelloss$ bin/pyspark > > Python 3.6.3 |Anaconda, Inc.| (default, Nov 8 2017, 18:10:31) > > [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin > > Type "help", "copyright", "credits" or "license" for more information. > > 18/06/14 11:34:14 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > > Using Spark's default log4j profile: org/apache/spark/log4j- > defaults.properties > > Setting default log level to "WARN". > > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > > /Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py:45: > UserWarning: Failed to initialize Spark session. > > warnings.warn("Failed to initialize Spark session.") > > Traceback (most recent call last): > > File "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py", > line 41, in > > spark = SparkSession._create_shell_session() > > File > "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/sql/session.py", > line 564, in _create_shell_session > > SparkContext._jvm.org.apache.hadoop.hive.conf.HiveConf() > > TypeError: 'JavaPackage' object is not callable > > I also tried to delete hadoop deps from my ivy2 cache and reinstall them > but no luck. I wonder: > > >1. I have not seen this before, could this be caused by recent change >to head? >2. Am I doing something wrong in the build process? > > > Thanks much! > Li > >