Hi,
I’ve got one more question as a follow up to my prior question on working with
multi-file zipped CSVs. [1] Figured it was worth asking in another thread so it
would be easier for others to see specific question about case_when.
I’m trying to accomplish something like pandas DataFrame.Series.map where I map
values of a arrow array to a new value.
pyarrow.compute.case_when looks like a candidate to solve this, but after
reading the docs, I’m still not clear on how to structure the argument to the
“cond” parameter or if there is alternative functionality that would be better.
Example input, mapping and expected output:
import pyarrow as pa
import pyarrow.compute as pc
map = {“a”: 1, “b”: 2, “c”: 3}
input_array = pa.array([“a”, “b”, “c”, “a”])
expected_output = pa.array([1, 2, 3, 1])
Logic I’m hoping for would be the equivalent of the following SQL:
Case
when input_array = “a” then 1
when input_array = “b” then 2
when input_array = “c” then 3
else input_array
End
Or alternatively, if input array was a a pandas Series then
input_array.map(map).
Thanks again,
Ryan
[1] https://www.mail-archive.com/[email protected]/msg02379.html