Re: Test coverage is now integrated to codecov.io

2020-03-01 Thread Bhavani Sudha
This is super useful. Thanks Ramachandran!

-Sudha

On Sat, Feb 29, 2020 at 7:42 PM leesf  wrote:

> Great job, thanks for your work.
>
> Sivabalan  于2020年2月29日周六 下午12:02写道:
>
> > Good job! thanks for adding.
> >
> > On Fri, Feb 28, 2020 at 5:41 PM vino yang  wrote:
> >
> > >  Hi Ram,
> > >
> > > Thanks for your great work to make the code coverage clear.
> > >
> > > Best,
> > > Vino
> > >
> > > Vinoth Chandar  于2020年2月29日周六 上午4:39写道:
> > >
> > > > Thanks Ram! This will definitely help improve the code quality over
> > time!
> > > >
> > > > On Fri, Feb 28, 2020 at 9:45 AM Ramachandran Madras Subramaniam
> > > >  wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > Diff 1347  was
> > > > merged
> > > > > into master yesterday. This enables visibility into code coverage
> of
> > > hudi
> > > > > in general and also provides insights into differential coverage
> > during
> > > > > peer reviews.
> > > > >
> > > > > Since this is very recent and is getting integrated, you might see
> > some
> > > > > partial results in your diff. There can be 2 scenarios here,
> > > > >
> > > > > 1. Your diff is not rebased with latest master and hence the code
> > > > coverage
> > > > > report was not generated. To solve this issue, you just have to
> > rebase
> > > to
> > > > > latest master.
> > > > > 2. Code coverage ran but reported as zero. Three was one diff
> (#1350)
> > > > where
> > > > > we saw this issue yesterday. This in general shouldn't happen.
> Could
> > > have
> > > > > been due to an outage in codecov website. I will be monitoring
> > upcoming
> > > > > diffs for the near future to see if this problem persists. Please
> > ping
> > > me
> > > > > in the diff if you have any questions/concerns regarding code
> > coverage.
> > > > >
> > > > > Thanks,
> > > > > Ram
> > > > >
> > > >
> > >
> >
> >
> > --
> > Regards,
> > -Sivabalan
> >
>


Re: Hudi 0.5.0 -> Hive JDBC call fails

2020-03-01 Thread selvaraj periyasamy
Thanks Vinoth. We do have plan to move hive 2.x version in near future.
Can I get any info on the workaround for hive 1.x versions?

Thanks,
Selva

On Sun, Mar 1, 2020 at 3:19 PM Vinoth Chandar  wrote:

> We have dropped support for Hive 1.x, a while back. Would you be able to
> move to Hive 2.x?
>
> IIRC there were some workarounds discussed on this thread before. But,
> given the push towards Hive 3.x, its good to be on 2.x atleast ..
> Let me know and we can go from there :)
>
> On Sun, Mar 1, 2020 at 1:09 PM selvaraj periyasamy <
> selvaraj.periyasamy1...@gmail.com> wrote:
>
> > I am using Hudi 0.5.0 and then write using sparkwriter.
> >
> > My spark version is 2.3.0
> > Scala version 2.11.8
> > Hive version 1.2.2
> >
> > Write is success but hive call is failing. When checked some google
> > reference, It seems to be an hive client is higher version the server.
> > Since Hudi is built on hive 2.3.1, Is there a way to use 1.2.2?
> >
> > 2020-03-01 12:16:50 WARN  HoodieSparkSqlWriter$:110 - hoodie dataset at
> > hdfs://localhost:9000/projects/cdp/data/attunity_poc/attunity_rep_base
> > already exists. Deleting existing data & overwriting with new data.
> > [Stage 111:>
> >  2020-03-01
> > 12:16:51 ERROR HiveConnection:697 - Error opening session
> > org.apache.thrift.TApplicationException: Required field 'client_protocol'
> > is unset! Struct:TOpenSessionReq(client_protocol:null,
> >
> >
> configuration:{set:hiveconf:hive.server2.thrift.resultset.default.fetch.size=1000,
> > use:database=default})
> > at
> >
> >
> org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
> > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
> > at
> >
> >
> org.apache.hudi.org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:168)
> > at
> >
> >
> org.apache.hudi.org.apache.hive.service.rpc.thrift.TCLIService$Client.OpenSession(TCLIService.java:155)
> >
> >
> > Thanks,
> > Selva
> >
>


Re: Hudi 0.5.0 -> Hive JDBC call fails

2020-03-01 Thread Vinoth Chandar
We have dropped support for Hive 1.x, a while back. Would you be able to
move to Hive 2.x?

IIRC there were some workarounds discussed on this thread before. But,
given the push towards Hive 3.x, its good to be on 2.x atleast ..
Let me know and we can go from there :)

On Sun, Mar 1, 2020 at 1:09 PM selvaraj periyasamy <
selvaraj.periyasamy1...@gmail.com> wrote:

> I am using Hudi 0.5.0 and then write using sparkwriter.
>
> My spark version is 2.3.0
> Scala version 2.11.8
> Hive version 1.2.2
>
> Write is success but hive call is failing. When checked some google
> reference, It seems to be an hive client is higher version the server.
> Since Hudi is built on hive 2.3.1, Is there a way to use 1.2.2?
>
> 2020-03-01 12:16:50 WARN  HoodieSparkSqlWriter$:110 - hoodie dataset at
> hdfs://localhost:9000/projects/cdp/data/attunity_poc/attunity_rep_base
> already exists. Deleting existing data & overwriting with new data.
> [Stage 111:>
>  2020-03-01
> 12:16:51 ERROR HiveConnection:697 - Error opening session
> org.apache.thrift.TApplicationException: Required field 'client_protocol'
> is unset! Struct:TOpenSessionReq(client_protocol:null,
>
> configuration:{set:hiveconf:hive.server2.thrift.resultset.default.fetch.size=1000,
> use:database=default})
> at
>
> org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
> at
>
> org.apache.hudi.org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:168)
> at
>
> org.apache.hudi.org.apache.hive.service.rpc.thrift.TCLIService$Client.OpenSession(TCLIService.java:155)
>
>
> Thanks,
> Selva
>


Hudi 0.5.0 -> Hive JDBC call fails

2020-03-01 Thread selvaraj periyasamy
I am using Hudi 0.5.0 and then write using sparkwriter.

My spark version is 2.3.0
Scala version 2.11.8
Hive version 1.2.2

Write is success but hive call is failing. When checked some google
reference, It seems to be an hive client is higher version the server.
Since Hudi is built on hive 2.3.1, Is there a way to use 1.2.2?

2020-03-01 12:16:50 WARN  HoodieSparkSqlWriter$:110 - hoodie dataset at
hdfs://localhost:9000/projects/cdp/data/attunity_poc/attunity_rep_base
already exists. Deleting existing data & overwriting with new data.
[Stage 111:>
 2020-03-01
12:16:51 ERROR HiveConnection:697 - Error opening session
org.apache.thrift.TApplicationException: Required field 'client_protocol'
is unset! Struct:TOpenSessionReq(client_protocol:null,
configuration:{set:hiveconf:hive.server2.thrift.resultset.default.fetch.size=1000,
use:database=default})
at
org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
at
org.apache.hudi.org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:168)
at
org.apache.hudi.org.apache.hive.service.rpc.thrift.TCLIService$Client.OpenSession(TCLIService.java:155)


Thanks,
Selva


Re: [INFORM] Apache Hudi talk at Strata Data & AI Conference

2020-03-01 Thread Vinoth Chandar
Thanks for the headsup, leesf :)

On Sat, Feb 29, 2020 at 11:02 PM leesf  wrote:

> hi all,
>
> I just noticed that balaji and vinoth will give a talk(Bring stream
> processing to batch data using Apache Hudi (incubating)) at Strata Data &
> AI Conference in Mar 15–18, you would attend if interested and checkout it
> out for more details
>
> https://conferences.oreilly.com/strata-data-ai/stai-ca/public/schedule/detail/80221
>


[ANNOUNCE] Hudi Weekly Community Update(2020-02-23~20200301)

2020-03-01 Thread leesf
Dear community,

Happy to share Hudi community weekly update for 2020-02-23 ~ 2020-03-01
with updates on development, features, bugs and tests.

Development

[KeyGenerator] A discussion to support for complex record keys with
TimestampBasedKeyGenerator, and may combine TimestampBasedKeyGenerator with
ComplexKeyGenerator [1]
[Improve Performance] A discussion about improving the merge performance
for cow is still under discussion, it aims to use spark operators(e.g.
mapToPair) to merge records instead of ExternalSpillableMap. [2]
[Test coverage] Test coverage is now integrated to codecov.io, it helps to
improve code quality. [3]
[Release] Code is frozen for next release(0.5.2), and rc1 will be sent in
the next few days. [4]

[1]
https://lists.apache.org/thread.html/r177c2e0d499cb2c7ec1a4f62646b23d545683e2036df1a36e57e0fa3%40%3Cdev.hudi.apache.org%3E
[2]
https://lists.apache.org/thread.html/rcfd072e3ae90f401eb005e282a4a80fba10438e62ebb32f46f983d95%40%3Cdev.hudi.apache.org%3E
[3]
https://lists.apache.org/thread.html/ra753a535860a6f46f92f9c3e5a5e09c983f558989171c5de745818d9%40%3Cdev.hudi.apache.org%3E
[4]
https://lists.apache.org/thread.html/reeb63ccddee3324a60a59253f234672170434642049a65c97208c75a%40%3Cdev.hudi.apache.org%3


Features

[Spark Integration] Enable incremental pulling from defined partitions. [5]
[Code Cleanup] Cleanup package structure in hudi-client. [6]
[CI] Aggregate code coverage and publish to codecov.io during CI [7]


[5] https://jira.apache.org/jira/browse/HUDI-597
[6] https://jira.apache.org/jira/browse/HUDI-554
[7] https://jira.apache.org/jira/browse/HUDI-627


Bugs

[Writer] Fixing performance issues around DiskBasedMap & kryo. [8]
[Release] Fix could not get sources warnings while compiling [9]

[8] https://jira.apache.org/jira/browse/HUDI-625
[9] https://jira.apache.org/jira/browse/HUDI-636


Tests

[Tests] Adding unit tests for PriorityBasedFileSystemView [10]

[10] https://jira.apache.org/jira/browse/HUDI-618

Best,
Leesf