Re: Upsert is EXTREMELY slow

Josh Elser Wed, 11 Jul 2018 07:05:58 -0700

The explain plan for your tables isn't a substitute for the DDLs. Pleaseprovide those.

How about sharing your completely hbase-site.xml and hbase-env.sh files,rather than just snippets like you have. A full picture is often needed.

Given that HBase cannot directly run on S3, please also describe howyour have this setup functioning. Do you have an HDFS instance runningfor the HBase WALs? Finally, assuming that you're using EC2 if you'reusing S3, what kind of nodes do you have?

As a courtesy, please take the time to consolidate your thoughts intoone email instead of sending multiple in rapid succession like this.Remember that Apache communities are volunteer based.


On 7/11/18 7:53 AM, alchemist wrote:


I have written a threaded program to batch upsert data into Phoenix.  I am
using Phoenix because of secondary index capability.  I am getting extremely
slow performance in write.

Explain query looks like following

0: jdbc:phoenix:localhost:2181:/hbase> EXPLAIN select count(1) from
VBQL_PHOENIX_TRANSCRIPT5
. . . . . . . . . . . . . . . . . . .> ;
+---------------------------------------------------------------------------------------------------------------+-----------------+------+
|                                                     PLAN
| EST_BYTES_READ  | EST_ |
+---------------------------------------------------------------------------------------------------------------+-----------------+------+
| CLIENT 100-CHUNK 6838539 ROWS 314572800 BYTES PARALLEL 100-WAY FULL SCAN
OVER VBQL_PHOENIX_TRANSCRIPT_INDEX5  | 314572800       | 6838 |
|     SERVER FILTER BY FIRST KEY ONLY
| 314572800       | 6838 |
|     SERVER AGGREGATE INTO SINGLE ROW
| 314572800       | 6838 |
+---------------------------------------------------------------------------------------------------------------+-----------------+------+

I have secondary index that looks like this:

0: jdbc:phoenix:localhost:2181:/hbase> EXPLAIN select count(1) from
VBQL_PHOENIX_TRANSCRIPT_INDEX5;
+---------------------------------------------------------------------------------------------------------------+-----------------+------+
|                                                     PLAN
| EST_BYTES_READ  | EST_ |
+---------------------------------------------------------------------------------------------------------------+-----------------+------+
| CLIENT 100-CHUNK 6838539 ROWS 314572800 BYTES PARALLEL 100-WAY FULL SCAN
OVER VBQL_PHOENIX_TRANSCRIPT_INDEX5  | 314572800       | 6838 |
|     SERVER FILTER BY FIRST KEY ONLY
| 314572800       | 6838 |
|     SERVER AGGREGATE INTO SINGLE ROW
| 314572800       | 6838 |
+---------------------------------------------------------------------------------------------------------------+-----------------+------+


Tuning Settings used in HBase.  Problem is it is very hard to scale, I tried
adding more nodes to Hbase cluster, and I also tried adding more threads to
the client program but it is not scaling beyond 6K per minutes, which is
VERY SLOW.  Any help is greatly appreciated.

  <property>
       <name>index.writer.threads.max</name>
       <value>30</value>
  </property>
  <property>
       <name>index.builder.threads.max</name>
       <value>30</value>
  </property>
  <property>
       <name>phoenix.query.threadPoolSize</name>
       <value>256</value>
  </property>
  <property>
       <name>index.builder.threads.keepalivetime</name>
       <value>90000</value>
  </property>
<property>
       <name>phoenix.query.timeoutMs</name>
       <value>90000</value>
  </property>

  Rest of the settings I am using are default.

Source Code  (Simple multithreded  with write need to be saved individually/
we cannot use batch  write)

  public void generateAndInsertDataToHbase(int iterationCount ){
    try {
                 int iterations =0;
                 Instant start = Instant.now();
                 ExecutorService executorService = 
Executors.newCachedThreadPool();
              while (true) {
                if (iterations == iterationCount) {
                        verifyRowCountAndExit(start,executorService);
                }
                Future<Void> future = executorService.submit(new 
Callable<Void>() {
                    public Void call() throws Exception {
                        List<VbqlBean> vbqlList =
VbqlUtils.generateRandomAccountVbql(accountCategoryMap);
                        if(vbqlList.size() >0) {
                                addVbqlAccountToHbaseAsBatch( vbqlList,connStr);
                        }
                        return null;
                    }
                });
                future.get();
                iterations ++;
              }
    }catch(Exception e) {
        e.printStackTrace();
    }
  }

  public void addVbqlAccountToHbaseAsBatch(List<VbqlBean> vbqlList, String
connStr) throws Exception{
         Connection connection = null;
         try {
         connection = DBUtils.initSingleConnection(connStr);
        // connection.setAutoCommit(false);
         if(vbqlList.size() >0) {
                for (VbqlBean vbqlBean : vbqlList) {
                        DBUtils.executeSingleVbqlQuery( connStr,  vbqlBean, 
tableName);
                }
         }
         connection.commit();
         } finally {
         if (connection != null) try { connection.close(); } catch
(SQLException ignore) {}
     }
  }





--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/

Re: Upsert is EXTREMELY slow

Reply via email to