date:20200422

unsubscribe

2020-04-22 Thread akram azarm

-- 
*M Akram Azarm*
*B Eng. in Software Engineering (Reading)*
*UOW,UK / IIT,LK*

Contact | 077-502-0402

Re: Error while reading hive tables with tmp/hidden files inside partitions

2020-04-22 Thread Wenchen Fan

This looks like a bug that path filter doesn't work for hive table reading.
Can you open a JIRA ticket?

On Thu, Apr 23, 2020 at 3:15 AM Dhrubajyoti Hati 
wrote:

> Just wondering if any one could help me out on this.
>
> Thank you!
>
>
>
>
> *Regards,Dhrubajyoti Hati.*
>
>
> On Wed, Apr 22, 2020 at 7:15 PM Dhrubajyoti Hati 
> wrote:
>
>> Hi,
>>
>> Is there any way to discard files starting with dot(.) or ending with
>> .tmp in the hive partition while reading from Hive table using
>> spark.read.table method.
>>
>> I tried using PathFilters but they didn't work. I am using spark-submit
>> and passing my python file(pyspark) containing the source code.
>>
>> spark.sparkContext._jsc.hadoopConfiguration().set("mapreduce.input.pathFilter.class",
>> "com.abc.hadoop.utility.TmpFileFilter")
>>
>> class TmpFileFilter extends PathFilter {
>>   override def accept(path : Path): Boolean = !path.getName.endsWith(".tmp")
>> }
>>
>> Still in the detailed logs I can see .tmp files are getting considered in
>> the detailed logs:
>> 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus
>> maprfs:///a/hour=05/host=abc/FlumeData.1587559137715
>> 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus
>> maprfs:///a/hour=05/host=abc/FlumeData.1587556815621
>> 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus
>> maprfs:///a/hour=05/host=abc/.FlumeData.1587560277337.tmp
>>
>>
>> Is there any way to discard the tmp(.tmp) or the hidden files(filename
>> starting with dot or underscore) in hive partitions while reading from
>> spark?
>>
>>
>>
>>
>> *Regards,Dhrubajyoti Hati.*
>>
>

Re: Getting the ball started on a 2.4.6 release

2020-04-22 Thread wuyi

We have a conclusion now and we decide to include SPARK-31509 in the PR of
SPARK-31485. So there actually should be only one candidate(But to be
honest, it still depends on committers).

Best,
Yi Wu



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Error while reading hive tables with tmp/hidden files inside partitions

2020-04-22 Thread Dhrubajyoti Hati

Just wondering if any one could help me out on this.

Thank you!




*Regards,Dhrubajyoti Hati.*


On Wed, Apr 22, 2020 at 7:15 PM Dhrubajyoti Hati 
wrote:

> Hi,
>
> Is there any way to discard files starting with dot(.) or ending with .tmp
> in the hive partition while reading from Hive table using spark.read.table
> method.
>
> I tried using PathFilters but they didn't work. I am using spark-submit
> and passing my python file(pyspark) containing the source code.
>
> spark.sparkContext._jsc.hadoopConfiguration().set("mapreduce.input.pathFilter.class",
> "com.abc.hadoop.utility.TmpFileFilter")
>
> class TmpFileFilter extends PathFilter {
>   override def accept(path : Path): Boolean = !path.getName.endsWith(".tmp")
> }
>
> Still in the detailed logs I can see .tmp files are getting considered in
> the detailed logs:
> 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus
> maprfs:///a/hour=05/host=abc/FlumeData.1587559137715
> 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus
> maprfs:///a/hour=05/host=abc/FlumeData.1587556815621
> 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus
> maprfs:///a/hour=05/host=abc/.FlumeData.1587560277337.tmp
>
>
> Is there any way to discard the tmp(.tmp) or the hidden files(filename
> starting with dot or underscore) in hive partitions while reading from
> spark?
>
>
>
>
> *Regards,Dhrubajyoti Hati.*
>

Re: Getting the ball started on a 2.4.6 release

2020-04-22 Thread Holden Karau

Thanks, I agree improving that error message instead of hanging could be a
good candidate for backporting to 2.4

On Tue, Apr 21, 2020 at 6:43 PM wuyi  wrote:

> I have one: https://issues.apache.org/jira/browse/SPARK-31485, which could
> cause application hang.
>
>
> And, probably, also https://issues.apache.org/jira/browse/SPARK-31509, to
> make better guidance of barrier execution for user. But we do not have
> conclusion yet.
>
> Best,
>
> Yi Wu
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

unsubscribe

2020-04-22 Thread Hemanth meka

SPIP: Catalog API for view metadata

2020-04-22 Thread John Zhuge

Hi everyone,

In order to disassociate view metadata from Hive Metastore and support
different storage backends, I am proposing a new view catalog API to load,
create, alter, and drop views.

Document:
https://docs.google.com/document/d/1XOxFtloiMuW24iqJ-zJnDzHl2KMxipTjJoxleJFz66A/edit?usp=sharing
JIRA: https://issues.apache.org/jira/browse/SPARK-31357
WIP PR: https://github.com/apache/spark/pull/28147

As part of a project to support common views across query engines like
Spark and Presto, my team used the view catalog API in Spark
implementation. The project has been in production over three months.

Thanks,
John Zhuge

unsubscribe

Re: Error while reading hive tables with tmp/hidden files inside partitions

Re: Getting the ball started on a 2.4.6 release

Re: Error while reading hive tables with tmp/hidden files inside partitions

Re: Getting the ball started on a 2.4.6 release

unsubscribe

SPIP: Catalog API for view metadata

7 matches

Site Navigation

Mail list logo

Footer information