unsubscribe

2020-04-22 Thread akram azarm
-- 
*M Akram Azarm*
*B Eng. in Software Engineering (Reading)*
*UOW,UK / IIT,LK*

Contact | 077-502-0402





Re: Error while reading hive tables with tmp/hidden files inside partitions

2020-04-22 Thread Wenchen Fan
This looks like a bug that path filter doesn't work for hive table reading.
Can you open a JIRA ticket?

On Thu, Apr 23, 2020 at 3:15 AM Dhrubajyoti Hati 
wrote:

> Just wondering if any one could help me out on this.
>
> Thank you!
>
>
>
>
> *Regards,Dhrubajyoti Hati.*
>
>
> On Wed, Apr 22, 2020 at 7:15 PM Dhrubajyoti Hati 
> wrote:
>
>> Hi,
>>
>> Is there any way to discard files starting with dot(.) or ending with
>> .tmp in the hive partition while reading from Hive table using
>> spark.read.table method.
>>
>> I tried using PathFilters but they didn't work. I am using spark-submit
>> and passing my python file(pyspark) containing the source code.
>>
>> spark.sparkContext._jsc.hadoopConfiguration().set("mapreduce.input.pathFilter.class",
>> "com.abc.hadoop.utility.TmpFileFilter")
>>
>> class TmpFileFilter extends PathFilter {
>>   override def accept(path : Path): Boolean = !path.getName.endsWith(".tmp")
>> }
>>
>> Still in the detailed logs I can see .tmp files are getting considered in
>> the detailed logs:
>> 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus
>> maprfs:///a/hour=05/host=abc/FlumeData.1587559137715
>> 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus
>> maprfs:///a/hour=05/host=abc/FlumeData.1587556815621
>> 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus
>> maprfs:///a/hour=05/host=abc/.FlumeData.1587560277337.tmp
>>
>>
>> Is there any way to discard the tmp(.tmp) or the hidden files(filename
>> starting with dot or underscore) in hive partitions while reading from
>> spark?
>>
>>
>>
>>
>> *Regards,Dhrubajyoti Hati.*
>>
>


Re: Getting the ball started on a 2.4.6 release

2020-04-22 Thread wuyi
We have a conclusion now and we decide to include SPARK-31509 in the PR of
SPARK-31485. So there actually should be only one candidate(But to be
honest, it still depends on committers).

Best,
Yi Wu



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Error while reading hive tables with tmp/hidden files inside partitions

2020-04-22 Thread Dhrubajyoti Hati
Just wondering if any one could help me out on this.

Thank you!




*Regards,Dhrubajyoti Hati.*


On Wed, Apr 22, 2020 at 7:15 PM Dhrubajyoti Hati 
wrote:

> Hi,
>
> Is there any way to discard files starting with dot(.) or ending with .tmp
> in the hive partition while reading from Hive table using spark.read.table
> method.
>
> I tried using PathFilters but they didn't work. I am using spark-submit
> and passing my python file(pyspark) containing the source code.
>
> spark.sparkContext._jsc.hadoopConfiguration().set("mapreduce.input.pathFilter.class",
> "com.abc.hadoop.utility.TmpFileFilter")
>
> class TmpFileFilter extends PathFilter {
>   override def accept(path : Path): Boolean = !path.getName.endsWith(".tmp")
> }
>
> Still in the detailed logs I can see .tmp files are getting considered in
> the detailed logs:
> 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus
> maprfs:///a/hour=05/host=abc/FlumeData.1587559137715
> 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus
> maprfs:///a/hour=05/host=abc/FlumeData.1587556815621
> 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus
> maprfs:///a/hour=05/host=abc/.FlumeData.1587560277337.tmp
>
>
> Is there any way to discard the tmp(.tmp) or the hidden files(filename
> starting with dot or underscore) in hive partitions while reading from
> spark?
>
>
>
>
> *Regards,Dhrubajyoti Hati.*
>


Re: Getting the ball started on a 2.4.6 release

2020-04-22 Thread Holden Karau
Thanks, I agree improving that error message instead of hanging could be a
good candidate for backporting to 2.4

On Tue, Apr 21, 2020 at 6:43 PM wuyi  wrote:

> I have one: https://issues.apache.org/jira/browse/SPARK-31485, which could
> cause application hang.
>
>
> And, probably, also https://issues.apache.org/jira/browse/SPARK-31509, to
> make better guidance of barrier execution for user. But we do not have
> conclusion yet.
>
> Best,
>
> Yi Wu
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


unsubscribe

2020-04-22 Thread Hemanth meka



SPIP: Catalog API for view metadata

2020-04-22 Thread John Zhuge
Hi everyone,

In order to disassociate view metadata from Hive Metastore and support
different storage backends, I am proposing a new view catalog API to load,
create, alter, and drop views.

Document:
https://docs.google.com/document/d/1XOxFtloiMuW24iqJ-zJnDzHl2KMxipTjJoxleJFz66A/edit?usp=sharing
JIRA: https://issues.apache.org/jira/browse/SPARK-31357
WIP PR: https://github.com/apache/spark/pull/28147

As part of a project to support common views across query engines like
Spark and Presto, my team used the view catalog API in Spark
implementation. The project has been in production over three months.

Thanks,
John Zhuge