[jira] [Created] (KUDU-2827) Backup should tombstone dropped tables

2019-05-24 Thread Mike Percy (JIRA)
Mike Percy created KUDU-2827:


 Summary: Backup should tombstone dropped tables
 Key: KUDU-2827
 URL: https://issues.apache.org/jira/browse/KUDU-2827
 Project: Kudu
  Issue Type: Task
  Components: backup
Reporter: Mike Percy


It would be useful for backup to "tombstone" dropped tables so that the GC 
process can detect this and eventually consider these eligible for deletion, 
even though they are still on the restore path from a backup graph perspective.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2557) Sometimes the rebalancer-related tests are runing for too long

2019-05-24 Thread Hao Hao (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847733#comment-16847733
 ] 

Hao Hao commented on KUDU-2557:
---

Ah, ok, thanks for catching that!

> Sometimes the rebalancer-related tests are runing for too long
> --
>
> Key: KUDU-2557
> URL: https://issues.apache.org/jira/browse/KUDU-2557
> Project: Kudu
>  Issue Type: Bug
>  Components: CLI, test
>Affects Versions: 1.8.0
>Reporter: Alexey Serbin
>Assignee: Alexey Serbin
>Priority: Minor
>  Labels: CLI, flaky-test, rebalance, test
> Fix For: 1.9.0
>
> Attachments: kudu-admin-test.2.txt
>
>
> The rebalancer-related tests in {{kudu-admin-test}} sometimes gets wild and 
> run for too long.  That's observed in RELEASE builds at least:
> {noformat}
> ConcurrentRebalancersTest.TwoConcurrentRebalancers/1: test_main.cc:63] 
> Maximum unit test time exceeded (900 sec)
> {noformat}
> {noformat}
> TserverGoesDownDuringRebalancingTest.TserverDown/1: test_main.cc:63] Maximum 
> unit test time exceeded (900 sec)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2786) Parallelize tables for backup and restore

2019-05-24 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-2786:
---

Assignee: Will Berkeley

> Parallelize tables for backup and restore 
> --
>
> Key: KUDU-2786
> URL: https://issues.apache.org/jira/browse/KUDU-2786
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Grant Henke
>Assignee: Will Berkeley
>Priority: Major
>  Labels: backup
>
> Currently the backup and restore jobs process tables serially. This works 
> well to ensure resources aren't over allocated upfront, but could be less 
> performant for cases where there are many small tables. Instead we could 
> parallelize the Spark jobs for each table. 
> It should be straightforward to use Scala futures to run multiple jobs in 
> parallel and check their status. We could add a configuration to cap the 
> maximum number of tables run at the same time, though maybe that isn't really 
> needed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2670) Splitting more tasks for spark job, and add more concurrent for scan operation

2019-05-24 Thread Xu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847285#comment-16847285
 ] 

Xu Yao commented on KUDU-2670:
--

I will continue this work. If C++ client no pressing need, I will finish #1 
first. :)

> Splitting more tasks for spark job, and add more concurrent for scan operation
> --
>
> Key: KUDU-2670
> URL: https://issues.apache.org/jira/browse/KUDU-2670
> Project: Kudu
>  Issue Type: Improvement
>  Components: java, spark
>Affects Versions: 1.8.0
>Reporter: yangz
>Assignee: Xu Yao
>Priority: Major
>  Labels: performance
>
> Refer to the KUDU-2437 Split a tablet into primary key ranges by size.
> We need a java client implementation to support the split the tablet scan 
> operation.
> We suggest two new implementation for the java client.
>  # A ConcurrentKuduScanner to get more scanner read data at the same time. 
> This will be useful for one case.  We scanner only one row, but the predicate 
> doesn't contain the primary key, for this case, we will send a lot scanner 
> request but only one row return.It will be slow to send so much scanner 
> request one by one. So we need a concurrent way. And by this case we test, 
> for a 10G tablet, it will save a lot time for one machine.
>  # A way to split more spark task. To do so, we need get scanner tokens for 
> two step, first we send to the tserver to give range, then with this range we 
> get more scanner tokens. For our usage we make a tablet 10G, but we split a 
> task to process only 1G data. So we get better performance.
> And all this feature has run well for us for half a year. We hope this 
> feature will be useful for the community.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)