AlJohri commented on issue #21834: [SPARK-22814][SQL] Support Date/Timestamp in a JDBC partition column URL: https://github.com/apache/spark/pull/21834#issuecomment-490221138 If anyone finds themselves here looking to do this in pyspark, until support for this feature is added, here is a workaround: ```python import datetime from itertools import tee def date_range(start, end, intv): diff = (end - start) / intv for i in range(intv): yield start + diff * i yield end def pairwise(iterable): a, b = tee(iterable) next(b, None) return zip(a, b) partition_column = 'mypartitioncol' now = datetime.datetime.now(datetime.timezone.utc) num_partitions = sc.defaultParallelism lower_bound=now + datetime.timedelta(-30) upper_bound=now predicates = [] for start, end in pairwise(date_range(lower_bound, upper_bound, num_partitions)): predicates.append(f'{partition_column} >= \'{start.isoformat()}\' AND {partition_column} < \'{end.isoformat()}\'') df = (spark.read.jdbc( url='myjdbcuri', table='mytable', predicates=predicates, properties={'driver': 'org.postgresql.Driver'})) ``` Associated [JIRA](https://issues.apache.org/jira/browse/SPARK-22814?focusedCommentId=16833145&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16833145) comment.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
