chenhao-db commented on PR #47796:
URL: https://github.com/apache/spark/pull/47796#issuecomment-2299501720

   @HyukjinKwon Unfortunately, the behavior is not the same. Running the query 
in Hive gives a different result:
   
   ```
   0: jdbc:hive2://> select 
xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b');
   +------+
   | _c0  |
   +------+
   | []   |
   +------+
   ```
   
   This is a day-1 issue since the `xpath` expression was introduced to Spark. 
The root cause is that Hive drops null array elements 
([source](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/xml/GenericUDFXPath.java#L84)),
 while Spark doesn't 
([source](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xml/xpath.scala#L259)).
   
   I'm not sure about the next step. We could make Spark consistent with Hive, 
but that would be a breaking change for Spark.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to