alamb commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-2066290235
Nice -- thank you for the offer and information @samuelcolvin and @adriangb
## High level proposal
I think it would initailly possible to implement JSON support usi
samuelcolvin commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-2061121332
tiny update to my example above, I realised there’s a much better comparison
query:
```sql
-- datafusion
SELECT count(*) FROM records where json_contains(att
adriangb commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-2061047207
For what it’s worth I think having the ability to performantly parse JSON
stored as a String or Binary is valuable in and of itself. You don’t always
control how the data
samuelcolvin commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-2060995995
@alamb if you're interested in JSON parsing support I might be interested in
contributing.
We (Pydantic) maintain a very fast Rust JSON parser (generally signifi
alamb commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-2019276288
FYI this topic came up in our first meetup
https://github.com/apache/arrow-datafusion/discussions/8522
--
This is an automated message from the Apache Git Service.
To respo
alamb commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-2016956940
I think we are now pretty close to being able to support JSON / JSONB via
scalar functions
The basic idea might be:
1. Implement ScalarUDFs for the relevant JSON/BSO
abuisman commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-2016914220
I'd very much like it if there were jsonb or json support. I want to use the
pg_analytics extension and they use datafusion.
--
This is an automated message from the Ap
rtyler commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1939704217
From a user's standpoint I've run into this now from a Datafusion SQL
standpoint. As a SQL user I am hurting mostly by the lack of a
[get_json_object()](https://spark.apach
mwylde commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1924465343
We support a few JSON functions
(https://doc.arroyo.dev/sql/scalar-functions#json-functions) for querying JSON
data, and for more complex needs users can write Rust UDFs wit
alamb commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1923549690
> Similarly, for tables defined via SQL DDL, we support a JSON type that has
the same behavior.
How do people query such types? Do you have native operator support (lik
mwylde commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1922454327
Our immediate concern (which motivated our json extension type and the
changes in https://github.com/ArroyoSystems/arrow-rs/tree/49.0.0/json) is being
able to support partia
alamb commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1922212879
I think there are two major themes here:
## Theme 1
How to query such semi-structured data.
DataFusion today supports the Arrow type system, which while power
alamb commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1922184856
Cross posting. There are some interesting ideas in
https://github.com/apache/arrow-datafusion/discussions/9103#discussion-6168066
--
This is an automated message from the A
philippemnoel commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1921698433
Definitely interested!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
thinkharderdev commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1779821351
We (Coralogix) built our own binary jsonb format (we call it jsona for json
arrow) that we are planning on open-sourcing in the next couple months
(hopefully Jan/Feb
alamb commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1779746226
Related proposal for user defined types:
https://github.com/apache/arrow-datafusion/issues/7923
--
This is an automated message from the Apache Git Service.
To respond to t
yukkit commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1774370490
Discussion of support for extension type can be found at
https://github.com/apache/arrow-rs/issues/4472
--
This is an automated message from the Apache Git Service.
To r
dojiong commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1770272743
Maybe add ExtensionType to Arrow's DataType is more naturely:
```rust
trait ExtensionType {
fn inner_type(&self) -> &DataType;
//
}
enu
dojiong commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1770236289
> have JSON and BSON be [extension
types](https://arrow.apache.org/docs/format/Columnar.html#extension-types), so
DataFusion could recognize them via field metadata.
wjones127 commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1769907740
> We could solve it in two ways:
> 1. add Json/Jsonb type to Arrow, then support it in DataFusion natively
> 2. only add json functions to DataFusion, treat utf8/bi
dojiong commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1769768742
> A third way could be to parse JSON data into Arrow Structs
> One limitation of this approach is that it requires all the JSON records
to have the same schema
Yea
alamb commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1769306077
I also think there is a solution that is part way between what @dojiong
proposes in
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1767459328:
store BS
alamb commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1769304206
A third way could be to parse JSON data into Arrow `Struct`s (which is what
the json reader does now) and then improve the Struct support in DataFusion
with the various JSON
dojiong commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1767459328
> DataFusion supports reading JSON in and some basic things like field
access (like json['field_name']
Indexed field access is only valid for `List` or `Struct` types
alamb commented on issue #7845:
URL:
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1767413293
DataFusion supports reading JSON in and some basic things like field access
(like `json['field_name']`
However, it doesn't have the range of operatos that postgres does
dojiong opened a new issue, #7845:
URL: https://github.com/apache/arrow-datafusion/issues/7845
### Is your feature request related to a problem or challenge?
Datafusion does not support JSON/JSONB datatype. Is there a plan to support
it in the future?
### Describe the s
26 matches
Mail list logo