Re: [PR] [SPARK-48495][SQL][DOCS] Describe shredding scheme for Variant [spark]

via GitHub Wed, 18 Sep 2024 01:51:48 -0700


Zouxxyy commented on PR #46831:
URL: https://github.com/apache/spark/pull/46831#issuecomment-2357875732


   @cashmand  Shredding will be a great improvement for variant, looking 
forward to its implementation! How is its progress so far~ And after reading 
this document, I have some questions and hope to get help:
   1. Will shredded variant be a new type? Because I see that it is currently a 
nested and changing Struct type, it is a bit difficult to imagine how to 
describe it.
   2. For the write side, how is the shredding schema generated adaptively? 
From the description in the document, it looks dynamic, is it at the table 
level / file level / or even rowGroup level? And, I see that many layers of 
nesting are currently designed, does this have an impact on the write overhead.
   3. For the read side, if it is a file-level schema, how should spark 
integrate it when reading. For example, if we want to obtain a certain path, 
but if the schemas of different files are different, how should we determine 
the physical plan.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-48495][SQL][DOCS] Describe shredding scheme for Variant [spark]

Reply via email to