Hi All, I am using Pyspark to get the value from a column on basis of regex.
Following is the regex which I am using: (^\[OrderID:\s)?(?(1).*\]\s\[UniqueID:\s([a-z0-9A-Z]*)\].*|\[.*\]\s\[([a-z0-9A-Z]*)\].*) df = spark.createDataFrame([("[1234] [3333] [4444] [66]",), ("abcd",)],["stringValue"]) result = df.withColumn('extracted value', F.regexp_extract(F.col('stringValue'), '(^\[OrderID:\s)?(?(1).*\]\s\[UniqueID:\s([a-z0-9A-Z]*)\].*|\[.*\]\s\[([a-z0-9A-Z]*)\].*)', 1)) I have tried with spark.sql as well. It is giving empty output. I have tested this regex , it is working fine on an online regextester . But it is not working in spark . I know spark needs Java based regex , hence I tried escaping also , that gave exception: : java.util.regex.PatternSyntaxException: Unknown inline modifier near index 21 (^\[OrderID:\s)?(?(1).*\]\s\[UniqueID:\s([a-z0-9A-Z]*)\].*|\[.*\]\s\[([a-z0-9A-Z]*)\].*) Can you please help here? Kind Regards, Sachit Murarka