AlJohri commented on issue #21834: [SPARK-22814][SQL] Support Date/Timestamp in a JDBC partition column URL: https://github.com/apache/spark/pull/21834#issuecomment-489357987 @gatorsmile @maropu this currently does not work with pyspark due to this line: https://github.com/apache/spark/blob/d9bcacf94b93fe76542b5c1fd852559075ef6faa/python/pyspark/sql/readwriter.py#L563-L564 it tries to convert `lowerBound` and `upperBound` to an `int`. The resulting traceback is: ``` --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-40-2636f0dd1e0a> in <module> 16 upperBound=now, 17 numPartitions=sc.defaultParallelism, ---> 18 properties={'driver': 'org.postgresql.Driver'}) 19 .join(article_metadata, on=['url'], how='left') 20 .orderBy('timestamp', ascending=False)) /usr/lib/spark/python/pyspark/sql/readwriter.py in jdbc(self, url, table, column, lowerBound, upperBound, numPartitions, predicates, properties) 550 assert numPartitions is not None, \ 551 "numPartitions can not be None when ``column`` is specified" --> 552 return self._df(self._jreader.jdbc(url, table, column, int(lowerBound), int(upperBound), 553 int(numPartitions), jprop)) 554 if predicates is not None: TypeError: int() argument must be a string, a bytes-like object or a number, not 'datetime.datetime' ``` I think just removing the `int` may fix the issue but I'm not 100% sure.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
