[GitHub] mikewalch commented on issue #347: ACCUMULO-4746 Fluent API for Mutation

2018-02-16 Thread GitBox
mikewalch commented on issue #347: ACCUMULO-4746 Fluent API for Mutation
URL: https://github.com/apache/accumulo/pull/347#issuecomment-366347764
 
 
   @bfach10, this is pretty nice addition to Accumulo. It would be great to get 
it committed. Let me know if you need any help.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] mikewalch commented on a change in pull request #378: ACCUMULO-4800 Cache parsing of iterator config

2018-02-16 Thread GitBox
mikewalch commented on a change in pull request #378: ACCUMULO-4800 Cache 
parsing of iterator config
URL: https://github.com/apache/accumulo/pull/378#discussion_r168859266
 
 

 ##
 File path: 
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/ScanDataSource.java
 ##
 @@ -186,12 +188,28 @@ public boolean isCurrent() {
 
 if (!loadIters) {
   return visFilter;
-} else if (null == options.getClassLoaderContext()) {
-  return 
iterEnv.getTopLevelIterator(IteratorUtil.loadIterators(IteratorScope.scan, 
visFilter, tablet.getExtent(), tablet.getTableConfiguration(),
-  options.getSsiList(), options.getSsio(), iterEnv));
 } else {
-  return 
iterEnv.getTopLevelIterator(IteratorUtil.loadIterators(IteratorScope.scan, 
visFilter, tablet.getExtent(), tablet.getTableConfiguration(),
-  options.getSsiList(), options.getSsio(), iterEnv, true, 
options.getClassLoaderContext()));
+  List iterInfos;
+  Map> iterOpts;
 
 Review comment:
   Can you describe what is going on in this block?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] mikewalch commented on a change in pull request #378: ACCUMULO-4800 Cache parsing of iterator config

2018-02-16 Thread GitBox
mikewalch commented on a change in pull request #378: ACCUMULO-4800 Cache 
parsing of iterator config
URL: https://github.com/apache/accumulo/pull/378#discussion_r168858373
 
 

 ##
 File path: 
server/base/src/main/java/org/apache/accumulo/server/conf/TableConfiguration.java
 ##
 @@ -144,4 +163,50 @@ public String toString() {
   public long getUpdateCount() {
 return parent.getUpdateCount() + 
getPropCacheAccessor().getZooCache().getUpdateCount();
   }
+
+  public static class ParsedIteratorConfig {
+private final List tableIters;
+private final Map> tableOpts;
+private final String context;
+private final long updateCount;
+
+private ParsedIteratorConfig(List ii, 
Map> opts, String context, long updateCount) {
+  this.tableIters = ImmutableList.copyOf(ii);
+  Builder> imb = ImmutableMap.builder();
+  for (Entry> entry : opts.entrySet()) {
+imb.put(entry.getKey(), ImmutableMap.copyOf(entry.getValue()));
+  }
+  tableOpts = imb.build();
+  this.context = context;
+  this.updateCount = updateCount;
+}
+
+public List getIterInfo() {
+  return tableIters;
+}
+
+public Map> getOpts() {
+  return tableOpts;
+}
+
+public String getContext() {
+  return context;
+}
+  }
+
+  public ParsedIteratorConfig getParsedIteratorConfig(IteratorScope scope) {
+long count = getUpdateCount();
 
 Review comment:
   I see that the iterator config will change if this count changes.  When does 
this count change?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (ACCUMULO-4820) Cleanup code for 2.0

2018-02-16 Thread Michael Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Miller updated ACCUMULO-4820:
-
Description: 
Running IntelliJ code inspect picks up a lot of minor code clean up fixes that 
we would be nice to fix sooner rather than later.
Java 5 Updates 
- unnecessary boxing
- use foreach where possible

Java 7 Updates (these should definitely be fixed)
- Explicit type can be replaced with <> (aka diamond operator)
- Identical catch branches in try 
- try finally replaceable with try with resources

Java 8 Updates (these would be nice, but maybe some prefer older way?)
- Replace anonymous types with lambda
- Replace code with new single Map method call
- Replace Collections.sort () with List.sort()

Other Misc performance Issues picked up by the inspector.  I think these should 
definitely be fixed but perhaps a sub ticket

  was:
Running IntelliJ code inspect picks up a lot of minor code clean up fixes that 
we would be nice to fix sooner rather than later.
Java 7 Updates (these should definitely be fixed)
- Explicit type can be replaced with <> (aka diamond operator)
- Identical catch branches in try 
- try finally replaceable with try with resources

Java 8 Updates (these would be nice, but maybe some prefer older way?)
- Replace anonymous types with lambda
- Replace code with new single Map method call
- Replace Collections.sort () with List.sort()

Other Misc performance Issues picked up by the inspector.  I think these should 
definitely be fixed but perhaps a sub ticket


> Cleanup code for 2.0
> 
>
> Key: ACCUMULO-4820
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4820
> Project: Accumulo
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Michael Miller
>Priority: Minor
> Fix For: 2.0.0
>
>
> Running IntelliJ code inspect picks up a lot of minor code clean up fixes 
> that we would be nice to fix sooner rather than later.
> Java 5 Updates 
> - unnecessary boxing
> - use foreach where possible
> Java 7 Updates (these should definitely be fixed)
> - Explicit type can be replaced with <> (aka diamond operator)
> - Identical catch branches in try 
> - try finally replaceable with try with resources
> Java 8 Updates (these would be nice, but maybe some prefer older way?)
> - Replace anonymous types with lambda
> - Replace code with new single Map method call
> - Replace Collections.sort () with List.sort()
> Other Misc performance Issues picked up by the inspector.  I think these 
> should definitely be fixed but perhaps a sub ticket



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ACCUMULO-4820) Cleanup code for 2.0

2018-02-16 Thread Michael Miller (JIRA)
Michael Miller created ACCUMULO-4820:


 Summary: Cleanup code for 2.0
 Key: ACCUMULO-4820
 URL: https://issues.apache.org/jira/browse/ACCUMULO-4820
 Project: Accumulo
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Michael Miller
 Fix For: 2.0.0


Running IntelliJ code inspect picks up a lot of minor code clean up fixes that 
we would be nice to fix sooner rather than later.
Java 7 Updates (these should definitely be fixed)
- Explicit type can be replaced with <> (aka diamond operator)
- Identical catch branches in try 
- try finally replaceable with try with resources

Java 8 Updates (these would be nice, but maybe some prefer older way?)
- Replace anonymous types with lambda
- Replace code with new single Map method call
- Replace Collections.sort () with List.sort()

Other Misc performance Issues picked up by the inspector.  I think these should 
definitely be fixed but perhaps a sub ticket



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ACCUMULO-4808) Add splits to table at table creation.

2018-02-16 Thread Mark Owens (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367311#comment-16367311
 ] 

Mark Owens commented on ACCUMULO-4808:
--

[~kturner], the second option may be the way to go. After talking with 
[~etcoleman] yesterday, one of the big issues is the need to have the newly 
split tablets nicely distributed across the cluster. Currently that is achieved 
by adding the splits, taking the table offline and bringing it back online to 
achieve that result. 

> Add splits to table at table creation.
> --
>
> Key: ACCUMULO-4808
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4808
> Project: Accumulo
>  Issue Type: New Feature
>  Components: master, tserver
>Reporter: Mark Owens
>Assignee: Mark Owens
>Priority: Major
> Fix For: 2.0.0
>
>
> Add capability to add table splits at table creation. Recent changes now 
> allow iterator and locality groups to be created at table creation. Do the 
> same with splits. Comment below from 
> [ACCUMULO-4806|https://issues.apache.org/jira/browse/ACCUMULO-4806] explains 
> the motivation for the request:
> {quote}[~etcoleman] added a comment - 2 hours ago
> It would go al long way if the splits could be added at table creation or 
> when table is offline.  When the other API changes were made by Mark, I 
> wondered if this task could also could be done at that time - but I believe 
> that it was more complicated.
> The delay is that when a table is created and then the splits added and then 
> taken offline there is a period proportional to the number of splits as they 
> are off-loaded from the tserver where they originally got assigned.  (The 
> re-online with splits distributed across the cluster is quite fast)
> If the splits could be added at table creation, or while the table is offline 
> so that the delay for shedding the tablets could be avoided, then the need to 
> perform the actual import offline would not be as necessary.
>  
> {quote}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ACCUMULO-4813) Accepting mapping file for bulk import

2018-02-16 Thread Keith Turner (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366690#comment-16366690
 ] 

Keith Turner commented on ACCUMULO-4813:


[~m-hogue] it would be additive. I think it would be nice to deprecate the 
current bulk import process in favor of this.  This could new done by adding 
new APIs to do bulk import with a mapping file and deprecating the current APIs.

> Accepting mapping file for bulk import
> --
>
> Key: ACCUMULO-4813
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4813
> Project: Accumulo
>  Issue Type: Sub-task
>Reporter: Keith Turner
>Priority: Major
> Fix For: 2.0.0
>
>
> During bulk import, inspecting files to determine where they go is expensive 
> and slow.  In order to spread the cost, Accumulo has an internal mechanism to 
> spread the work of inspecting files to random tablet servers.  Because this 
> internal process takes time and consumes resources on the cluster, users want 
> control over it.  The best way to give this control may be to externalize it 
> by allowing bulk imports to have a mapping file.  This mapping file would 
> specify the ranges where files should be loaded.  If Accumulo provided API to 
> help produce this file, then that work could be done in Map Reduce or Spark.  
> This would give users all the control they want over when and where this 
> computation is done.  This would naturally fit in the process used to create 
> the bulk files. 
> To make bulk import fast this mapping file should have the following 
> properties.
>  * Key in file is a range
>  * Value in file is a list of files
>  * Ranges are non overlapping
>  * File is sorted by range/key
>  * Has a mapping for every non-empty file in the bulk import directory.
> If Accumulo provides APIs to do the following operation, then producing the 
> file could written as a map/reduce job.
>  * For a given rfile produce a list of row ranges where the file should be 
> loaded.  These row ranges would be based on tablets.
>  * Merge row range,list of file pairs
>  * Serialize row range,list of files pairs
> With a mapping file, the bulk import algorithm could be written as follows.  
> This could all be executed in the master with no need to run inspection task 
> on random tablet servers.
>  * Sanity check file
>  ** Ensure in sorted order
>  ** Ensure ranges are non-overlapping
>  ** Ensure each file in directory has at least one entry in file
>  ** Ensure all splits in the file exist in the table.
>  * Since file is sorted can do a merged read of file and metadata table, 
> looping over the following operations for each tablet until all files are 
> loaded.
>  ** Read the loaded files for the tablet
>  ** Read the files to load for the range
>  ** For any files not loaded, send an async load message to the tablet server
> The above algorithm can just keep scanning the metadata table and sending 
> async load messages until the bulk import is complete.  Since the load 
> messages are async, the bulk load could of a large number of files could 
> potentially be very fast.
> The bulk load operation can easily handle the case of tablets splitting 
> during the operation by matching a single range in the file to multiple 
> tablets.  However attempting to handle merges would be a lot more tricky.  It 
> would probably be simplest to fail the operation if a merge is detected.  The 
> nice thing is that this can be done in a very clean way.   Once the bulk 
> import operation has the table lock, merges can not happen.  So after getting 
> the table lock the bulk import operation can ensure all splits in the file 
> exist in the table. The operation can abort if the condition is not met 
> before doing any work.  If this condition is not met, it indicates a merge 
> happened between generating the mapping file an doing the bulk import.
> Hopefully the mapping file plus the algorithm that sends async load messages 
> can dramatically speed up bulk import operations.  This may lessen the need 
> for other things like prioritizing bulk import.  To measure this, it would be 
> very useful create a bulk import performance test that can create many files 
> with very little data and measure the time it takes load them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)