AlJohri commented on issue #21834: [SPARK-22814][SQL] Support Date/Timestamp in 
a JDBC partition column
URL: https://github.com/apache/spark/pull/21834#issuecomment-490221138
 
 
   If anyone finds themselves here looking to do this in pyspark, until support 
for this feature is added, here is a workaround:
   
   ```python
   import datetime
   from itertools import tee
   
   def date_range(start, end, intv):
       diff = (end - start) / intv
       for i in range(intv):
           yield start + diff * i
       yield end
   
   def pairwise(iterable):
       a, b = tee(iterable)
       next(b, None)
       return zip(a, b)
   
   partition_column = 'mypartitioncol'
   now = datetime.datetime.now(datetime.timezone.utc)
   num_partitions = sc.defaultParallelism
   lower_bound=now + datetime.timedelta(-30)
   upper_bound=now
   
   predicates = []
   for start, end in pairwise(date_range(lower_bound, upper_bound, 
num_partitions)):
       predicates.append(f'{partition_column} >= \'{start.isoformat()}\' AND 
{partition_column} < \'{end.isoformat()}\'')
   
   df = (spark.read.jdbc(
               url='myjdbcuri',
               table='mytable',
               predicates=predicates,
               properties={'driver': 'org.postgresql.Driver'}))
   ```
   
   Associated 
[JIRA](https://issues.apache.org/jira/browse/SPARK-22814?focusedCommentId=16833145&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16833145)
 comment.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to