[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

2021-02-19 Thread GitBox
razajafri commented on pull request #31284: URL: https://github.com/apache/spark/pull/31284#issuecomment-782206308 @cloud-fan I have updated the tests PTAL This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

2021-02-18 Thread GitBox
razajafri commented on pull request #31284: URL: https://github.com/apache/spark/pull/31284#issuecomment-781590779 > the new fix LGTM. Do you know why the test didn't expose that bug previously? The old test was just comparing the row counts and not the actual data. I have updated

[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

2021-02-05 Thread GitBox
razajafri commented on pull request #31284: URL: https://github.com/apache/spark/pull/31284#issuecomment-773471709 > Do we need to change this part? > >

[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

2021-02-04 Thread GitBox
razajafri commented on pull request #31284: URL: https://github.com/apache/spark/pull/31284#issuecomment-773471709 > Do we need to change this part? > >

[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

2021-02-01 Thread GitBox
razajafri commented on pull request #31284: URL: https://github.com/apache/spark/pull/31284#issuecomment-771216519 > @razajafri, do you mind clarifying PR description? For exmaple, I thought you meant writing out to files or somewhere by: > > > Spark should read it as a long but

[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

2021-02-01 Thread GitBox
razajafri commented on pull request #31284: URL: https://github.com/apache/spark/pull/31284#issuecomment-771025263 @tgravescs I have updated the test with comment that you recommeded, PTAL @cloud-fan do you have any other questions?

[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

2021-01-28 Thread GitBox
razajafri commented on pull request #31284: URL: https://github.com/apache/spark/pull/31284#issuecomment-769380453 > Another simpler idea is to fix the schema inference: > >

[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

2021-01-27 Thread GitBox
razajafri commented on pull request #31284: URL: https://github.com/apache/spark/pull/31284#issuecomment-768845586 > Thank you for making a PR, @razajafri . Could you rebase this PR to the master branch please? I have rebased. PTAL

[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

2021-01-27 Thread GitBox
razajafri commented on pull request #31284: URL: https://github.com/apache/spark/pull/31284#issuecomment-768845356 > @razajafri, do you mind clarifying PR description? For exmaple, I thought you meant writing out to files or somewhere by: > > > Spark should read it as a long but

[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

2021-01-27 Thread GitBox
razajafri commented on pull request #31284: URL: https://github.com/apache/spark/pull/31284#issuecomment-768618314 @revans2 @tgravescs is there anything else you guys need me to look at in this? This is my first PR in Spark, where do we go from here?

[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

2021-01-27 Thread GitBox
razajafri commented on pull request #31284: URL: https://github.com/apache/spark/pull/31284#issuecomment-768517258 @revans2 I ran a test manually with two files with 1M records written with Spark 3.0.0. They were read in with Spark-3.0.0, Spark-3.1 and with master with my fix. Each file

[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

2021-01-27 Thread GitBox
razajafri commented on pull request #31284: URL: https://github.com/apache/spark/pull/31284#issuecomment-768424921 > > ... but write it as an int by downcasting it ... > > @razajafri would you mind pointing out where it happens? Sure, here is the original

[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

2021-01-26 Thread GitBox
razajafri commented on pull request #31284: URL: https://github.com/apache/spark/pull/31284#issuecomment-768074328 > The code looks fine to me but do you have any tests to show what the performance impact might be to other parquet files that have decimal values written as they typically

[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

2021-01-26 Thread GitBox
razajafri commented on pull request #31284: URL: https://github.com/apache/spark/pull/31284#issuecomment-768066093 > I think this can be revised a bit to make it more understandable. I guess another approach is to initialize the `WriteColumnVector` to use `long` array instead of `int`. It

[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

2021-01-22 Thread GitBox
razajafri commented on pull request #31284: URL: https://github.com/apache/spark/pull/31284#issuecomment-765743754 > I'm a bit confused by your description, it would be nice to add more detail. looking at the code I think what you are saying is that you read it as a long from the parquet

[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

2021-01-21 Thread GitBox
razajafri commented on pull request #31284: URL: https://github.com/apache/spark/pull/31284#issuecomment-765022515 @tgravescs @revans2 PTAL This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

2021-01-21 Thread GitBox
razajafri commented on pull request #31284: URL: https://github.com/apache/spark/pull/31284#issuecomment-765022515 @tgravescs @revans2 PTAL This is an automated message from the Apache Git Service. To respond to the message,