razajafri commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-782206308
@cloud-fan I have updated the tests PTAL
This is an automated message from the Apache Git Service.
To respond
razajafri commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-781590779
> the new fix LGTM. Do you know why the test didn't expose that bug
previously?
The old test was just comparing the row counts and not the actual data. I
have updated
razajafri commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-773471709
> Do we need to change this part?
>
>
razajafri commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-773471709
> Do we need to change this part?
>
>
razajafri commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-771216519
> @razajafri, do you mind clarifying PR description? For exmaple, I thought
you meant writing out to files or somewhere by:
>
> > Spark should read it as a long but
razajafri commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-771025263
@tgravescs I have updated the test with comment that you recommeded, PTAL
@cloud-fan do you have any other questions?
razajafri commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-769380453
> Another simpler idea is to fix the schema inference:
>
>
razajafri commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-768845586
> Thank you for making a PR, @razajafri . Could you rebase this PR to the
master branch please?
I have rebased. PTAL
razajafri commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-768845356
> @razajafri, do you mind clarifying PR description? For exmaple, I thought
you meant writing out to files or somewhere by:
>
> > Spark should read it as a long but
razajafri commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-768618314
@revans2 @tgravescs is there anything else you guys need me to look at in
this? This is my first PR in Spark, where do we go from here?
razajafri commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-768517258
@revans2 I ran a test manually with two files with 1M records written with
Spark 3.0.0. They were read in with Spark-3.0.0, Spark-3.1 and with master with
my fix. Each file
razajafri commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-768424921
> > ... but write it as an int by downcasting it ...
>
> @razajafri would you mind pointing out where it happens?
Sure, here is the original
razajafri commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-768074328
> The code looks fine to me but do you have any tests to show what the
performance impact might be to other parquet files that have decimal values
written as they typically
razajafri commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-768066093
> I think this can be revised a bit to make it more understandable. I guess
another approach is to initialize the `WriteColumnVector` to use `long` array
instead of `int`. It
razajafri commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-765743754
> I'm a bit confused by your description, it would be nice to add more
detail. looking at the code I think what you are saying is that you read it as
a long from the parquet
razajafri commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-765022515
@tgravescs @revans2 PTAL
This is an automated message from the Apache Git Service.
To respond to the message,
razajafri commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-765022515
@tgravescs @revans2 PTAL
This is an automated message from the Apache Git Service.
To respond to the message,
17 matches
Mail list logo