[GitHub] [spark] Daniel-Davies commented on pull request #38867: [WIP] [SPARK-41234][SQL][PYTHON] Add functionality for array_insert

GitBox Mon, 05 Dec 2022 04:03:02 -0800


Daniel-Davies commented on PR #38867:
URL: https://github.com/apache/spark/pull/38867#issuecomment-1337219140


   @LuciferYang for quick feedback I'd be grateful for an overarching review of 
the method, and some assistance on the following questions:
   
   - Core behaviour: one interesting property of snowflake's array_insert 
function is that it will let you extend the array further than (numElements + 
1) if you specify a far away index. For example, array_insert([1,2,3], 10, 4) 
will print [1,2,3,null,null,null,null,null,null,4). It would worry me a bit if 
an array could grow to astronomical sizes through some kind of mistake (e.g. 
are we happy with taking a risk of the 'pos' column containing a value of 
2,000,000,000?), so I've returned a null if the provided array 'pos' index is 
out of bounds. Let me know if the snowflake behaviour should be exactly 
reproduced instead.
   - I've used the scala library 'patch' function to implement the behaviour, 
which prioritises minimal code over performance. Please let me know if this 
should be changed.
   - I'm not too clear on how to implicitly cast the provided 'item' parameter 
yet (i.e. I provide an array of LongType and try to insert an IntegerType.). I 
think the array type should probably not change though (e.g. if I provide a 
LongType array but a StringType item for insertion, it doesn't feel right to 
cast the whole Array to a StringType).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Daniel-Davies commented on pull request #38867: [WIP] [SPARK-41234][SQL][PYTHON] Add functionality for array_insert

Reply via email to