Re: Parquet patch release

2017-01-06 Thread Dongjoon Hyun
Great! Thank you, Ryan. Bests, Dongjoon. On Fri, Jan 6, 2017 at 15:49 Xiao Li wrote: > Hi, Ryan, > > Really thank you for your help! > > Happy New Year! > > Xiao Li > > 2017-01-06 15:46 GMT-08:00 Ryan Blue : > > Last month, there was interest in

Re: Parquet patch release

2017-01-06 Thread Xiao Li
Hi, Ryan, Really thank you for your help! Happy New Year! Xiao Li 2017-01-06 15:46 GMT-08:00 Ryan Blue : > Last month, there was interest in a Parquet patch release on PR #16281 > . I went ahead and reviewed > commits that

Re: Parquet patch release

2017-01-06 Thread Reynold Xin
Thanks for the heads up, Ryan! On Fri, Jan 6, 2017 at 3:46 PM, Ryan Blue wrote: > Last month, there was interest in a Parquet patch release on PR #16281 > . I went ahead and reviewed > commits that should go into a Parquet

Parquet patch release

2017-01-06 Thread Ryan Blue
Last month, there was interest in a Parquet patch release on PR #16281 . I went ahead and reviewed commits that should go into a Parquet patch release and started a 1.8.2 discussion

Re: Tests failing with GC limit exceeded

2017-01-06 Thread shane knapp
(adding michael armbrust and josh rosen for visibility) ok. roughly 9% of all spark tests builds (including both PRB builds are failing due to GC overhead limits. $ wc -l SPARK_TEST_BUILDS GC_FAIL 1350 SPARK_TEST_BUILDS 125 GC_FAIL here are the affected builds (over the past ~2 weeks): $

Re: Tests failing with GC limit exceeded

2017-01-06 Thread shane knapp
On Fri, Jan 6, 2017 at 12:20 PM, shane knapp wrote: > FYI, this is happening across all spark builds... not just the PRB. s/all/almost all/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Tests failing with GC limit exceeded

2017-01-06 Thread shane knapp
FYI, this is happening across all spark builds... not just the PRB. i'm compiling a report now and will email that out this afternoon. :( On Thu, Jan 5, 2017 at 9:00 PM, shane knapp wrote: > unsurprisingly, we had another GC: > >

handling of empty partitions

2017-01-06 Thread geoHeil
I am working on building a custom ML pipeline-model / estimator to impute missing values, e.g. I want to fill with last good known value. Using a window function is slow / will put the data into a single partition. I built some sample code to use the RDD API however, it some None / null problems

Re: Converting an InternalRow to a Row

2017-01-06 Thread Andy Dang
Hi Liang-Chi, The snippet of code is below. If I bind the encoder early (the schema doesn't change throughout the execution), the final result is a list of the same entries. @RequiredArgsConstructor public class UDAF extends UserDefinedAggregateFunction { // Do not resolve and bind this

Re: Approach: Incremental data load from HBASE

2017-01-06 Thread Chetan Khatri
Hi Ayan, I mean by Incremental load from HBase, weekly running batch jobs takes rows from HBase table and dump it out to Hive. Now when next i run Job it only takes newly arrived jobs. Same as if we use Sqoop for incremental load from RDBMS to Hive with below command, sqoop job --create myssb1