Re: Phoenix CSV Bulk Load fails to load a large file

2017-09-06 Thread Ted Yu
bq. hbase.bulkload.retries.retryOnIOException is disabled. Unable to recover

The above is from HBASE-17165.

See if the load can pass after enabling the config.

On Wed, Sep 6, 2017 at 3:11 PM, Sriram Nookala  wrote:

> It finally times out with these exceptions
>
> ed Sep 06 21:38:07 UTC 2017, RpcRetryingCaller{globalStartTime=1504731276347,
> pause=100, retries=35}, java.io.IOException: Call to
> ip-10-123-0-60.ec2.internal/10.123.0.60:16020 failed on local exception:
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=77,
> waitTime=60001, operationTimeout=6 expired.
>
>
> at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(
> RpcRetryingCaller.java:159)
>
> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.
> tryAtomicRegionLoad(LoadIncrementalHFiles.java:956)
>
> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(
> LoadIncrementalHFiles.java:594)
>
> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(
> LoadIncrementalHFiles.java:590)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Caused by: java.io.IOException: Call to ip-10-123-0-60.ec2.internal/10
> .123.0.60:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException:
> Call id=77, waitTime=60001, operationTimeout=6 expired.
>
> at org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException(
> AbstractRpcClient.java:292)
>
> at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1274)
>
> at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(
> AbstractRpcClient.java:227)
>
> at org.apache.hadoop.hbase.ipc.AbstractRpcClient$
> BlockingRpcChannelImplementation.callBlockingMethod(
> AbstractRpcClient.java:336)
>
> at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
> BlockingStub.bulkLoadHFile(ClientProtos.java:35408)
>
> at org.apache.hadoop.hbase.protobuf.ProtobufUtil.
> bulkLoadHFile(ProtobufUtil.java:1676)
>
> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.call(
> LoadIncrementalHFiles.java:656)
>
> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.call(
> LoadIncrementalHFiles.java:645)
>
> at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(
> RpcRetryingCaller.java:137)
>
> ... 7 more
>
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=77,
> waitTime=60001, operationTimeout=6 expired.
>
> at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:73)
>
> at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1248)
>
> ... 14 more
>
> 17/09/06 21:38:07 ERROR mapreduce.LoadIncrementalHFiles:
> hbase.bulkload.retries.retryOnIOException is disabled. Unable to recover
>
> 17/09/06 21:38:07 INFO zookeeper.ZooKeeper: Session: 0x15e58ca21fc004c
> closed
>
> 17/09/06 21:38:07 INFO zookeeper.ClientCnxn: EventThread shut down
>
> Exception in thread "main" java.io.IOException: BulkLoad encountered an
> unrecoverable problem
>
> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(
> LoadIncrementalHFiles.java:614)
>
> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(
> LoadIncrementalHFiles.java:463)
>
> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(
> LoadIncrementalHFiles.java:373)
>
> at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.completebulkload(
> AbstractBulkLoadTool.java:355)
>
> at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.submitJob(
> AbstractBulkLoadTool.java:332)
>
> at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.loadData(
> AbstractBulkLoadTool.java:270)
>
> at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.run(
> AbstractBulkLoadTool.java:183)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>
> at org.apache.phoenix.mapreduce.CsvBulkLoadTool.main(
> CsvBulkLoadTool.java:101)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
>
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Failed after attempts=35, exceptions:
>
> Wed Sep 06 20:55:36 UTC 2017, RpcRetryingCaller{globalStartTime=1504731276347,
> pause=100, retries=35}, java.io.IOException: Call to
> ip-10-123-0-60.ec2.internal/10.123.0.60:16020 failed on local exception:
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=9,
> 

Re: Phoenix CSV Bulk Load fails to load a large file

2017-09-06 Thread Sriram Nookala
It finally times out with these exceptions

ed Sep 06 21:38:07 UTC 2017,
RpcRetryingCaller{globalStartTime=1504731276347, pause=100, retries=35},
java.io.IOException: Call to ip-10-123-0-60.ec2.internal/10.123.0.60:16020
failed on local exception:
org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=77,
waitTime=60001, operationTimeout=6 expired.


at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)

at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryAtomicRegionLoad(LoadIncrementalHFiles.java:956)

at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:594)

at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:590)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.io.IOException: Call to ip-10-123-0-60.ec2.internal/
10.123.0.60:16020 failed on local exception:
org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=77,
waitTime=60001, operationTimeout=6 expired.

at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException(AbstractRpcClient.java:292)

at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1274)

at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)

at
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)

at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.bulkLoadHFile(ClientProtos.java:35408)

at
org.apache.hadoop.hbase.protobuf.ProtobufUtil.bulkLoadHFile(ProtobufUtil.java:1676)

at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.call(LoadIncrementalHFiles.java:656)

at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.call(LoadIncrementalHFiles.java:645)

at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:137)

... 7 more

Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=77,
waitTime=60001, operationTimeout=6 expired.

at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:73)

at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1248)

... 14 more

17/09/06 21:38:07 ERROR mapreduce.LoadIncrementalHFiles:
hbase.bulkload.retries.retryOnIOException is disabled. Unable to recover

17/09/06 21:38:07 INFO zookeeper.ZooKeeper: Session: 0x15e58ca21fc004c
closed

17/09/06 21:38:07 INFO zookeeper.ClientCnxn: EventThread shut down

Exception in thread "main" java.io.IOException: BulkLoad encountered an
unrecoverable problem

at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:614)

at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:463)

at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:373)

at
org.apache.phoenix.mapreduce.AbstractBulkLoadTool.completebulkload(AbstractBulkLoadTool.java:355)

at
org.apache.phoenix.mapreduce.AbstractBulkLoadTool.submitJob(AbstractBulkLoadTool.java:332)

at
org.apache.phoenix.mapreduce.AbstractBulkLoadTool.loadData(AbstractBulkLoadTool.java:270)

at
org.apache.phoenix.mapreduce.AbstractBulkLoadTool.run(AbstractBulkLoadTool.java:183)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

at
org.apache.phoenix.mapreduce.CsvBulkLoadTool.main(CsvBulkLoadTool.java:101)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.apache.hadoop.util.RunJar.run(RunJar.java:221)

at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
after attempts=35, exceptions:

Wed Sep 06 20:55:36 UTC 2017,
RpcRetryingCaller{globalStartTime=1504731276347, pause=100, retries=35},
java.io.IOException: Call to ip-10-123-0-60.ec2.internal/10.123.0.60:16020
failed on local exception:
org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=9,
waitTime=60002, operationTimeout=6 expired.

On Wed, Sep 6, 2017 at 5:01 PM, Sriram Nookala  wrote:

> Phoenix 4.11.0, HBase 1.3.1
>
> This is what I get from jstack
>
> "main" #1 prio=5 os_prio=0 tid=0x7fb3d0017000 nid=0x5de7 waiting on
> condition [0x7fb3d75f7000]
>
>java.lang.Thread.State: WAITING (parking)
>
> at sun.misc.Unsafe.park(Native Method)
>
> - parking to wait for  <0xf588> (a java.util.concurrent.
> FutureTask)
>
> at 

Re: Phoenix CSV Bulk Load fails to load a large file

2017-09-06 Thread Sriram Nookala
Phoenix 4.11.0, HBase 1.3.1

This is what I get from jstack

"main" #1 prio=5 os_prio=0 tid=0x7fb3d0017000 nid=0x5de7 waiting on
condition [0x7fb3d75f7000]

   java.lang.Thread.State: WAITING (parking)

at sun.misc.Unsafe.park(Native Method)

- parking to wait for  <0xf588> (a
java.util.concurrent.FutureTask)

at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)

at java.util.concurrent.FutureTask.get(FutureTask.java:191)

at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:604)

at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:463)

at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:373)

at
org.apache.phoenix.mapreduce.AbstractBulkLoadTool.completebulkload(AbstractBulkLoadTool.java:355)

at
org.apache.phoenix.mapreduce.AbstractBulkLoadTool.submitJob(AbstractBulkLoadTool.java:332)

at
org.apache.phoenix.mapreduce.AbstractBulkLoadTool.loadData(AbstractBulkLoadTool.java:270)

at
org.apache.phoenix.mapreduce.AbstractBulkLoadTool.run(AbstractBulkLoadTool.java:183)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

at
org.apache.phoenix.mapreduce.CsvBulkLoadTool.main(CsvBulkLoadTool.java:101)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.apache.hadoop.util.RunJar.run(RunJar.java:221)

at org.apache.hadoop.util.RunJar.main(RunJar.java:136)




On Wed, Sep 6, 2017 at 4:16 PM, Sergey Soldatov 
wrote:

> Do you have more details on the version of Phoenix/HBase you are using as
> well as how it hangs (Exceptions/messages that may help to understand the
> problem)?
>
> Thanks,
> Sergey
>
> On Wed, Sep 6, 2017 at 1:13 PM, Sriram Nookala 
> wrote:
>
>> I'm trying to load a 3.5G file with 60 million rows using
>> CsvBulkLoadTool. It hangs while loading HFiles. This runs successfully if I
>> split this into 2 files, but I'd like to avoid doing that. This is on
>> Amazon EMR, is this an issue due to disk space or memory. I have a single
>> master and 2 region server configuration with 16 GB memory on each node.
>>
>
>


Re: Phoenix CSV Bulk Load fails to load a large file

2017-09-06 Thread Sergey Soldatov
Do you have more details on the version of Phoenix/HBase you are using as
well as how it hangs (Exceptions/messages that may help to understand the
problem)?

Thanks,
Sergey

On Wed, Sep 6, 2017 at 1:13 PM, Sriram Nookala  wrote:

> I'm trying to load a 3.5G file with 60 million rows using CsvBulkLoadTool.
> It hangs while loading HFiles. This runs successfully if I split this into
> 2 files, but I'd like to avoid doing that. This is on Amazon EMR, is this
> an issue due to disk space or memory. I have a single master and 2 region
> server configuration with 16 GB memory on each node.
>


Phoenix CSV Bulk Load fails to load a large file

2017-09-06 Thread Sriram Nookala
I'm trying to load a 3.5G file with 60 million rows using CsvBulkLoadTool.
It hangs while loading HFiles. This runs successfully if I split this into
2 files, but I'd like to avoid doing that. This is on Amazon EMR, is this
an issue due to disk space or memory. I have a single master and 2 region
server configuration with 16 GB memory on each node.


Re: Phoenix CSV Bulk Load Tool Date format for TIMESTAMP

2017-09-06 Thread Sriram Nookala
I'm still trying to set those up in Amazon EMR. However, setting the `
phoenix.query.dateFormatTimeZone` wouldn't fix the issue for all files
since we could receive a different date format in some other type of files.
Is there an option to write a custom mapper to transform the date?

On Tue, Sep 5, 2017 at 2:50 PM, Josh Elser  wrote:

> Sriram,
>
> Did you set the timezone and date-format configuration properties
> correctly for your environment?
>
> See `phoenix.query.dateFormatTimeZone` and `phoenix.query.dateFormat` as
> described http://phoenix.apache.org/tuning.html
>
>
> On 9/5/17 2:05 PM, Sriram Nookala wrote:
>
>> I'm trying to bulkload data using the CsvBulkLoadTool, one of the columns
>> is a data in the format YYDD for example 20160912. I don't get an
>> error, but the parsing is wrong and when I use sqlline I see the date show
>> up as 20160912-01-01 00:00:00.000. I had assumed as per the fix for
>> https://issues.apache.org/jira/browse/PHOENIX-1127 all data values would
>> be parsed correctly.
>>
>


Re: Support of OFFSET in Phoenix 4.7

2017-09-06 Thread rafa
Hi Sumanta,

Here you have the answer. You already asked the same question some months
ago :)

https://mail-archives.apache.org/mod_mbox/phoenix-user/201705.mbox/browser

>From 4.8

regards,
rafa

On Wed, Sep 6, 2017 at 9:19 AM, Sumanta Gh  wrote:

> Hi,
> From which version of Phoenix pagination with OFFSET is supported. It
> seems this is not supported in 4.7
>
> https://phoenix.apache.org/paged.html
>
> regards,
> Sumanta
>
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>


Support of OFFSET in Phoenix 4.7

2017-09-06 Thread Sumanta Gh
Hi,
>From which version of Phoenix pagination with OFFSET is supported. It seems 
>this is not supported in 4.7

https://phoenix.apache.org/paged.html

regards,
Sumanta
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you




Re: How to speed up write performance

2017-09-06 Thread James Taylor
Hi Hef,
Have you had a chance to read our Tuning Guide [1] yet? There's a lot of
good, general guidance there. There are some optimizations for write
performance that depend on how you expect/allow your data and schema to
change:
1) Is your data write-once? Make sure to declare your table with the
IMMUTABLE_ROWS=true property[2]. That will lower the overhead of a
secondary index as it's not necessary to read the data row (to get the old
value) prior to writing it when there are secondary indexes.
2) Does your schema only change in an append-only manner? For example, are
columns only added, but never removed? If so, you can declare your table as
APPEND_ONLY_SCHEMA as described here [2].
3) Does your schema never or rarely change at know times? If so, you can
declare an UPDATE_CACHE_FREQUENCY property as described here [2] to reduce
the RPC traffic.
4) Can you bulk load data [3] and then add or rebuild the index afterwards?
5) Have you investigated using local indexes [4]? They're optimized for
write speed since they ensure that the index data is on the same region
server as the data (i.e. all writes are local to the region server, no
cross region server calls, but there's some overhead at read time).
6) Have you considered not using secondary indexes and just letting your
less common queries be slower?

Keep in mind, with secondary indexes, you're essentially writing your data
twice. You'll need to expect that your write performance will drop. As
usual, there's a set of tradeoffs that you need to understand and choose
according to your requirements.

Thanks,
James

[1] https://phoenix.apache.org/tuning_guide.html
[2] https://phoenix.apache.org/language/index.html#options
[3] https://phoenix.apache.org/bulk_dataload.html
[4] https://phoenix.apache.org/secondary_indexing.html#Local_Indexes

On Tue, Sep 5, 2017 at 11:48 AM, Josh Elser  wrote:

> 500writes/seconds seems very low to me. On my wimpy laptop, I can easily
> see over 10K writes/second depending on the schema.
>
> The first check is to make sure that you have autocommit disabled.
> Otherwise, every update you make via JDBC will trigger an HBase RPC.
> Batching of RPCs to HBase is key to optimal performance via Phoenix.
>
> Regarding #2, unless you have intimate knowledge with how Phoenix writes
> data to HBase, do not investigate this approach.
>
>
> On 9/5/17 5:56 AM, Hef wrote:
>
>> Hi guys,
>> I'm evaluating using Phoenix to replace MySQL for better scalability.
>> The version I'm evaluating is 4.11-HBase-1.2, with some dependencies
>> modified to match CDH5.9 which we are using.
>>
>> The problem I'm having is the write performance to Phoenix from JDBC is
>> too poor, only 500writes/second, while our data's throughput is almost
>> 50,000/s. My questions are:
>> 1. If the 500/s TPS is normal speed? How fast can you achieve in your
>> production?
>> 2. Whether I can write directly into HBase with mutation API, and read
>> from Phoenix, that could be fast. But I don't see the secondary index be
>> created automatically in this case.
>>
>> Regards,
>> Hef
>>
>