[ https://issues.apache.org/jira/browse/SPARK-26128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694330#comment-16694330 ]
Hyukjin Kwon commented on SPARK-26128: -------------------------------------- I can't reproduce this: ``` scala> spark.range(10).write.parquet("/tmp/newparquet") 18/11/21 15:23:16 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory Scaling row group sizes to 96.54% for 7 writers 18/11/21 15:23:16 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory Scaling row group sizes to 84.47% for 8 writers 18/11/21 15:23:16 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory Scaling row group sizes to 96.54% for 7 writers scala> spark.read.parquet("/tmp/newparquet").where("id > 5").select(input_file_name()).show(5,false) +------------------------------------------------------------------------------------------+ |input_file_name() | +------------------------------------------------------------------------------------------+ |file:///tmp/newparquet/part-00007-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-00007-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-00006-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-00005-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| +------------------------------------------------------------------------------------------+ scala> spark.read.parquet("/tmp/newparquet").select(input_file_name()).show(5,false) +------------------------------------------------------------------------------------------+ |input_file_name() | +------------------------------------------------------------------------------------------+ |file:///tmp/newparquet/part-00007-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-00007-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-00003-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-00003-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-00000-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| +------------------------------------------------------------------------------------------+ only showing top 5 rows ``` mind showing how {{"/tmp/newparquet"}} is made? > filter breaks input_file_name > ----------------------------- > > Key: SPARK-26128 > URL: https://issues.apache.org/jira/browse/SPARK-26128 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Affects Versions: 2.3.2 > Reporter: Paul Praet > Priority: Minor > > This works: > {code:java} > scala> > spark.read.parquet("/tmp/newparquet").select(input_file_name).show(5,false) > +-----------------------------------------------------------------------------------------------------------------------------------------------------+ > |input_file_name() > | > +-----------------------------------------------------------------------------------------------------------------------------------------------------+ > |file:///tmp/newparquet/parquet-5-PT6H/junit/data/tenant=NA/year=2017/month=201704/day=20170406/hour=2017040618/data.eu-west-1b.290.PT6H.FINAL.parquet| > |file:///tmp/newparquet/parquet-5-PT6H/junit/data/tenant=NA/year=2017/month=201704/day=20170406/hour=2017040618/data.eu-west-1b.290.PT6H.FINAL.parquet| > |file:///tmp/newparquet/parquet-5-PT6H/junit/data/tenant=NA/year=2017/month=201704/day=20170406/hour=2017040618/data.eu-west-1b.290.PT6H.FINAL.parquet| > |file:///tmp/newparquet/parquet-5-PT6H/junit/data/tenant=NA/year=2017/month=201704/day=20170406/hour=2017040618/data.eu-west-1b.290.PT6H.FINAL.parquet| > |file:///tmp/newparquet/parquet-5-PT6H/junit/data/tenant=NA/year=2017/month=201704/day=20170406/hour=2017040618/data.eu-west-1b.290.PT6H.FINAL.parquet| > +-----------------------------------------------------------------------------------------------------------------------------------------------------+ > {code} > When adding a filter: > {code:java} > scala> > spark.read.parquet("/tmp/newparquet").where("key.station='XYZ'").select(input_file_name()).show(5,false) > +-----------------+ > |input_file_name()| > +-----------------+ > | | > | | > | | > | | > | | > +-----------------+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org