Re: Provide storage which use HBase's native client(not Asynchbase).

2016-05-29 Thread Jong Wook Kim
Following a month-old discussion,

I don’t think Asynchbase 1.7.1 supports the operation-level RPC timeout, and 
the related issue #11  is 
being delayed from v1.5.0 to v1.8.0, so we never know, how it would happen, 
unless we decide to contribute back to asynchbase.

The custom asynchbase currently has the following three methods:

- org.hbase.async.Scanner.setRpcTimeout()
- org.hbase.async.GetRequest.setMaxResultsPerColumnFamily()
- org.hbase.async.GetRequest.setRowOffsetPerColumnFamily()

which are only in the SteamShon’s patched version- so I guess we still need to 
keep our patched version separately.

To work on S2GRAPH-74, I have to solve the Guava version conflict issue, and 
I’ll go ahead and replace the 3 jars in s2core/lib with the shaded jar. 

Jong Wook


> On May 3, 2016, at 11:38 AM, DO YUNG YOON  wrote:
> 
> Agree with avoid unmanaged jars and publishing to maven cental.
> 
> As far as I remember, we applied custom patch to control rpc time per
> request, but I guess Asynchbase 1.7.1 also support it(not sure though).
> Let me check if we are still rely on custom patch. if we don't need custom
> patch, I think we should go with
> https://github.com/jongwook/asynchbase-shaded.
> 
> Thanks for your works, Jong Wook, I will update this after check.
> 
> 
> On Mon, May 2, 2016 at 1:18 PM Jong Wook Kim  wrote:
> 
>> So I went ahead and made an asynchbase package that shades Google Guava,
>> and thought it is a good chance to:
>> 
>> - avoid pulling duplicate Netty from two organizations - io.netty and
>> org.jboss.netty.
>> - remove log4j-over-slf4j from the runtime dependency: this assumes that
>> the application is not using log4j in favor of slf4j, and using something
>> else like logback. Most Apache hadoop/spark environment unfortunately uses
>> log4j, and due to its distributed nature it's not easy to switch. Better
>> not enforce anything on logging implementation as a library and let the
>> application decide.
>> 
>> I first started off making a Gradle project that pulls the official
>> Asynchbase 1.7.1 and relocates packages:
>> https://github.com/jongwook/asynchbase-shaded
>> 
>> But then I realized that s2graph was using a custom version of Asynchbase
>> for RPC timeout, etc., so I ended up making a fork of SteamShon/asynchbase
>> and used maven-shade-plugin to create the shaded jar:
>> https://github.com/jongwook/asynchbase
>> 
>> Running make && make pom.xml && mvn -DskipTests=true install will make a
>> shaded jar and pom in ~/.m2 provided that protoc 2.5.0 is installed, and
>> replacing jars at s2core/lib/ with the shaded jar seems to be working well.
>> 
>> Carrying around unmanaged jars in the repository is not a good idea, so we
>> need to consider publishing this version of asynchbase to maven central or
>> to some apache repo.
>> 
>> Jong Wook
>> 
>> 
>> 
>> On 21 April 2016 at 10:37, DO YUNG YOON  wrote:
>> 
>>> Thanks for suggestions Jong Wook.
>>> 
>>> I think shading would solve version conflict problems. I am not familiar
>>> with this issue. Jong Wook, can you contribute your knowledge on
>> shading? I
>>> think we should make sure discuss on version conflict problems regardless
>>> native client storage providing. so it would be better to open up
>> separate
>>> thread to discuss it. What do you guys think?
>>> 
>>> As you mentioned, with Native HBase client is all blocking API and What
>> we
>>> end up easily would be enclose blocking API with Scala's Future.
>>> I was wondering if there would be anyone who want to use HBase native
>>> client rather than asynchbase, but since s2graph public interfaces are
>> all
>>> asynchronous, I think there is any benefit to use blocking native client.
>>> 
>>> So you guys are on not provide native client storage? I am +1 on
>> providing
>>> it since it is easy and it is always better to provide more options.
>>> What others think?
>>> 
>>> Best Regards
>>> DOYUNG YOON
>>> 
>>> On Thu, Apr 21, 2016 at 1:17 PM Hyunsung Jo 
>> wrote:
>>> 
 It seems like Elasticsearch had similar problems:
 https://www.elastic.co/blog/to-shade-or-not-to-shade
 
 On Wed, Apr 20, 2016 at 10:36 AM Jong Wook Kim 
>> wrote:
 
> As per Guava version conflict, we should be able to shade the
>>> dependency
> to another package, maybe with the whole asynchbase together. Guava
> versions have been the PITA for many other projects too, and usually
>>> got
> avoided this way.
> 
> If we can avoid the Asynchbase+Guava issue by shading them and the
>> only
> interesting reason left to switch is the benchmark, it might not
>> worth
> going back to the blocking API as it will require a whole new
>> threading
> design.
> 
> Sincerely,
> Jong Wook
> 
> 
> Sent from my iPhone
> 
>> On Apr 19, 2016, at 9:01 PM, DO YUNG YOON 

Re: Provide storage which use HBase's native client(not Asynchbase).

2016-04-21 Thread DO YUNG YOON
Thanks for suggestions Jong Wook.

I think shading would solve version conflict problems. I am not familiar
with this issue. Jong Wook, can you contribute your knowledge on shading? I
think we should make sure discuss on version conflict problems regardless
native client storage providing. so it would be better to open up separate
thread to discuss it. What do you guys think?

As you mentioned, with Native HBase client is all blocking API and What we
end up easily would be enclose blocking API with Scala's Future.
I was wondering if there would be anyone who want to use HBase native
client rather than asynchbase, but since s2graph public interfaces are all
asynchronous, I think there is any benefit to use blocking native client.

So you guys are on not provide native client storage? I am +1 on providing
it since it is easy and it is always better to provide more options.
What others think?

Best Regards
DOYUNG YOON

On Thu, Apr 21, 2016 at 1:17 PM Hyunsung Jo  wrote:

> It seems like Elasticsearch had similar problems:
> https://www.elastic.co/blog/to-shade-or-not-to-shade
>
> On Wed, Apr 20, 2016 at 10:36 AM Jong Wook Kim  wrote:
>
> > As per Guava version conflict, we should be able to shade the dependency
> > to another package, maybe with the whole asynchbase together. Guava
> > versions have been the PITA for many other projects too, and usually got
> > avoided this way.
> >
> > If we can avoid the Asynchbase+Guava issue by shading them and the only
> > interesting reason left to switch is the benchmark, it might not worth
> > going back to the blocking API as it will require a whole new threading
> > design.
> >
> > Sincerely,
> > Jong Wook
> >
> >
> > Sent from my iPhone
> >
> > > On Apr 19, 2016, at 9:01 PM, DO YUNG YOON  wrote:
> > >
> > > Hi All.
> > >
> > > Since implementing storage becomes easier(I believe), I think it is
> good
> > to
> > > have HBaseStroage which use HBase's native client.
> > > The reason I brought up this is following.
> > >
> > > 1. in some environment, especially specific Hadoop and Spark cluster
> > > distribution,
> > > s2core have guava version conflict which comes from asynchbase.
> > >  - Many cases it is necessary to process stream of edges and write into
> > > HBase directly on streaming processing.
> > >  - Currently, there is no way to specify version to avoid above
> problem.
> > > With Native HBaseClient, users will be select right version for there
> > > environment.
> > > 2. It would be fun to run benchmark on these two client.
> > >
> > > any feedback would be appreciated.
> > >
> > > Best Regards.
> > > DOYUNG YOON
> >
>


Re: Provide storage which use HBase's native client(not Asynchbase).

2016-04-20 Thread Hyunsung Jo
It seems like Elasticsearch had similar problems:
https://www.elastic.co/blog/to-shade-or-not-to-shade

On Wed, Apr 20, 2016 at 10:36 AM Jong Wook Kim  wrote:

> As per Guava version conflict, we should be able to shade the dependency
> to another package, maybe with the whole asynchbase together. Guava
> versions have been the PITA for many other projects too, and usually got
> avoided this way.
>
> If we can avoid the Asynchbase+Guava issue by shading them and the only
> interesting reason left to switch is the benchmark, it might not worth
> going back to the blocking API as it will require a whole new threading
> design.
>
> Sincerely,
> Jong Wook
>
>
> Sent from my iPhone
>
> > On Apr 19, 2016, at 9:01 PM, DO YUNG YOON  wrote:
> >
> > Hi All.
> >
> > Since implementing storage becomes easier(I believe), I think it is good
> to
> > have HBaseStroage which use HBase's native client.
> > The reason I brought up this is following.
> >
> > 1. in some environment, especially specific Hadoop and Spark cluster
> > distribution,
> > s2core have guava version conflict which comes from asynchbase.
> >  - Many cases it is necessary to process stream of edges and write into
> > HBase directly on streaming processing.
> >  - Currently, there is no way to specify version to avoid above problem.
> > With Native HBaseClient, users will be select right version for there
> > environment.
> > 2. It would be fun to run benchmark on these two client.
> >
> > any feedback would be appreciated.
> >
> > Best Regards.
> > DOYUNG YOON
>