Re: Nutch + Gora + Hbase client ( BigTable )

2017-11-06 Thread SJC Multimedia
Thanks for the suggestion. Very interested in trying it out. Can you please
suggest step need to build gora from source so that I can modify
HBaseTableConnection?

I already have dependency for bigtable and hbase-common 1.2.3 in my ivy
file.

Thanks
Akshar

On Tue, Oct 31, 2017 at 12:27 PM, Alfonso Nishikawa <
alfonso.nishik...@gmail.com> wrote:

> Hi, Akshar.
>
> Much probably you are the first one in do what you are trying. I never
> used Google Cloud Platform, but in case there is no answer to your
> question, my only suggestion would be to clone the repository [1], try with
> the bigtable dependency:
>
>   
> com.google.cloud.bigtable
> bigtable-hbase-1.x-hadoop
> 1.0.0-pre3
>   
>
> and add some "catch" at HBaseTableConnection class [2] to see what is
> happening there.
>
> I know this is not a solution, but I am at your disposal for any question
> about this approach (when I know the answer, of course).
>
> [1] https://github.com/apache/gora/tree/apache-gora-0.8
> [2] https://github.com/apache/gora/blob/apache-gora-0.8/
> gora-hbase/src/main/java/org/apache/gora/hbase/store/
> HBaseTableConnection.java#L115
>
> Regars,
>
> Alfonso Nishikawa
>
>
>
> 2017-10-30 17:08 GMT-01:00 SJC Multimedia :
>
>> Hi
>>
>> I am trying out Google BigTable as a nutch backend for which there is no
>> official documentation that its supported. However I dont see any reason
>> why it would be not be possible so I am giving it a shot.
>>
>> I have upgraded Gora to 0.8 version with Nutch 2.3.1 and JDK to 1.8.
>>
>> Currently while utilizing *bigtable-hbase-1.x-hadoop-1.0.0-pre3.jar *version,
>> call to Bigtable fails while performing flushCommits as part of inject
>> operation. I do see the table getting created on the BigTable side but the
>> table is empty.
>>
>> The exception by itself is not enough to give us an answer.  The
>> UnsupportedOperationException is a bit strange.  I'm not sure where
>> that's coming from.  Here
>> 's a
>> guide on getting more information from a 
>> RetriesExhaustedWithDetailsException,
>> since neither Gora or BigtableBufferedMutator are under our control.
>>
>> This seems like a client-side thing, so this is likely some strange
>> interaction between BigTable library and Gora.
>>
>> *Any suggestion on how exactly to figure out what is the issue here?*
>>
>>
>> Here is grpc session info:
>>
>> 2017-10-27 17:37:51,462 INFO  grpc.BigtableSession - Bigtable options:
>> BigtableOptions{dataHost=bigtable.googleapis.com, tableAdminHost=
>> bigtableadmin.googleapis.com, instanceAdminHost=bigtableadmi
>> n.googleapis.com, projectId=xx-dev, instanceId=big-table-nutch-test,
>> userAgent=hbase-1.2.0-cdh5.13.0, credentialType=DefaultCredentials,
>> port=443, dataChannelCount=20, retryOptions=RetryOptions{retriesEnabled=true,
>> allowRetriesWithoutTimestamp=false, statusToRetryOn=[INTERNAL,
>> DEADLINE_EXCEEDED, ABORTED, UNAUTHENTICATED, UNAVAILABLE],
>> initialBackoffMillis=5, maxElapsedBackoffMillis=6,
>> backoffMultiplier=2.0, streamingBufferSize=60,
>> readPartialRowTimeoutMillis=6, maxScanTimeoutRetries=3},
>> bulkOptions=BulkOptions{asyncMutatorCount=2, useBulkApi=true,
>> bulkMaxKeyCount=25, bulkMaxRequestSize=1048576, autoflushMs=0,
>> maxInflightRpcs=1000, maxMemory=93218406, enableBulkMutationThrottling=false,
>> bulkMutationRpcTargetMs=100}, 
>> callOptionsConfig=CallOptionsConfig{useTimeout=false,
>> shortRpcTimeoutMs=6, longRpcTimeoutMs=60},
>> usePlaintextNegotiation=false}.
>>
>> Getting following error:
>>
>> 2017-10-27 17:37:51,660 ERROR store.HBaseStore - Failed 1 action:
>> UnsupportedOperationException: 1 time, servers with issues:
>> bigtable.googleapis.com,
>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> Failed 1 action: UnsupportedOperationException: 1 time, servers with
>> issues: bigtable.googleapis.com,
>> at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.hand
>> leExceptions(BigtableBufferedMutator.java:271)
>> at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.muta
>> te(BigtableBufferedMutator.java:198)
>> at org.apache.gora.hbase.store.HBaseTableConnection.flushCommit
>> s(HBaseTableConnection.java:115)
>> at org.apache.gora.hbase.store.HBaseTableConnection.close(HBase
>> TableConnection.java:127)
>> at org.apache.gora.hbase.store.HBaseStore.close(HBaseStore.java:819)
>> at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordW
>> riter.java:56)
>> at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.cl
>> ose(MapTask.java:647)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>> at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.
>> run(LocalJobRunner.java:243)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executor
>> s.java:511)
>> at 

Re: Nutch + Gora + Hbase client ( BigTable )

2017-10-31 Thread lewis john mcgibbney
ACK, we only really to try support Apache distributions for various
libraries. I think Alfonsos suggestion is best. Please keep in mind
however, Gora depends upon Hadoop 2.X now... you may also run in to some
issues there.
Lewis

On Tue, Oct 31, 2017 at 12:27 PM, Alfonso Nishikawa <
alfonso.nishik...@gmail.com> wrote:

> Hi, Akshar.
>
> Much probably you are the first one in do what you are trying. I never
> used Google Cloud Platform, but in case there is no answer to your
> question, my only suggestion would be to clone the repository [1], try with
> the bigtable dependency:
>
>   
> com.google.cloud.bigtable
> bigtable-hbase-1.x-hadoop
> 1.0.0-pre3
>   
>
> and add some "catch" at HBaseTableConnection class [2] to see what is
> happening there.
>
> I know this is not a solution, but I am at your disposal for any question
> about this approach (when I know the answer, of course).
>
> [1] https://github.com/apache/gora/tree/apache-gora-0.8
> [2] https://github.com/apache/gora/blob/apache-gora-0.8/
> gora-hbase/src/main/java/org/apache/gora/hbase/store/
> HBaseTableConnection.java#L115
>
> Regars,
>
> Alfonso Nishikawa
>
>
>
> 2017-10-30 17:08 GMT-01:00 SJC Multimedia :
>
>> Hi
>>
>> I am trying out Google BigTable as a nutch backend for which there is no
>> official documentation that its supported. However I dont see any reason
>> why it would be not be possible so I am giving it a shot.
>>
>> I have upgraded Gora to 0.8 version with Nutch 2.3.1 and JDK to 1.8.
>>
>> Currently while utilizing *bigtable-hbase-1.x-hadoop-1.0.0-pre3.jar *version,
>> call to Bigtable fails while performing flushCommits as part of inject
>> operation. I do see the table getting created on the BigTable side but the
>> table is empty.
>>
>> The exception by itself is not enough to give us an answer.  The
>> UnsupportedOperationException is a bit strange.  I'm not sure where
>> that's coming from.  Here
>> 's a
>> guide on getting more information from a 
>> RetriesExhaustedWithDetailsException,
>> since neither Gora or BigtableBufferedMutator are under our control.
>>
>> This seems like a client-side thing, so this is likely some strange
>> interaction between BigTable library and Gora.
>>
>> *Any suggestion on how exactly to figure out what is the issue here?*
>>
>>
>> Here is grpc session info:
>>
>> 2017-10-27 17:37:51,462 INFO  grpc.BigtableSession - Bigtable options:
>> BigtableOptions{dataHost=bigtable.googleapis.com, tableAdminHost=
>> bigtableadmin.googleapis.com, instanceAdminHost=bigtableadmi
>> n.googleapis.com, projectId=xx-dev, instanceId=big-table-nutch-test,
>> userAgent=hbase-1.2.0-cdh5.13.0, credentialType=DefaultCredentials,
>> port=443, dataChannelCount=20, retryOptions=RetryOptions{retriesEnabled=true,
>> allowRetriesWithoutTimestamp=false, statusToRetryOn=[INTERNAL,
>> DEADLINE_EXCEEDED, ABORTED, UNAUTHENTICATED, UNAVAILABLE],
>> initialBackoffMillis=5, maxElapsedBackoffMillis=6,
>> backoffMultiplier=2.0, streamingBufferSize=60,
>> readPartialRowTimeoutMillis=6, maxScanTimeoutRetries=3},
>> bulkOptions=BulkOptions{asyncMutatorCount=2, useBulkApi=true,
>> bulkMaxKeyCount=25, bulkMaxRequestSize=1048576, autoflushMs=0,
>> maxInflightRpcs=1000, maxMemory=93218406, enableBulkMutationThrottling=false,
>> bulkMutationRpcTargetMs=100}, 
>> callOptionsConfig=CallOptionsConfig{useTimeout=false,
>> shortRpcTimeoutMs=6, longRpcTimeoutMs=60},
>> usePlaintextNegotiation=false}.
>>
>> Getting following error:
>>
>> 2017-10-27 17:37:51,660 ERROR store.HBaseStore - Failed 1 action:
>> UnsupportedOperationException: 1 time, servers with issues:
>> bigtable.googleapis.com,
>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> Failed 1 action: UnsupportedOperationException: 1 time, servers with
>> issues: bigtable.googleapis.com,
>> at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.hand
>> leExceptions(BigtableBufferedMutator.java:271)
>> at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.muta
>> te(BigtableBufferedMutator.java:198)
>> at org.apache.gora.hbase.store.HBaseTableConnection.flushCommit
>> s(HBaseTableConnection.java:115)
>> at org.apache.gora.hbase.store.HBaseTableConnection.close(HBase
>> TableConnection.java:127)
>> at org.apache.gora.hbase.store.HBaseStore.close(HBaseStore.java:819)
>> at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordW
>> riter.java:56)
>> at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.cl
>> ose(MapTask.java:647)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>> at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.
>> run(LocalJobRunner.java:243)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executor
>> s.java:511)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at