[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports

2019-04-19 Thread Michael Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821860#comment-16821860
 ] 

Michael Miller commented on ACCUMULO-4806:
--

I believe this is completed.  Any follow on work we can open up a github issue. 

> Allow offline bulk imports
> --
>
> Key: ACCUMULO-4806
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4806
> Project: Accumulo
>  Issue Type: Sub-task
>  Components: master, tserver
>Reporter: Mark Owens
>Assignee: Michael Miller
>Priority: Major
> Fix For: 2.0.0
>
>
> Allowing offline bulk imports would be useful for some customers. Currently 
> these customers already take tables offline to set split points but then have 
> to bring them back online before starting the import.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports

2018-02-15 Thread Mark Owens (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365891#comment-16365891
 ] 

Mark Owens commented on ACCUMULO-4806:
--

[~etcoleman] There is some discussion on modifying the way bulk imports work in 
general. The proposed change is described in ticket 
[ACCUMULO-4813|https://issues.apache.org/jira/browse/ACCUMULO-4813]. Can you 
read over that ticket and provide some thoughts from your perspective? This 
would provide the users control in how bulk imports are done. It is thought 
that if this was pursued it would negate the need for some of the other 
recently created bulk import sub-tasks. If the bulk import process time could 
be drop to under a minute, for example, the need for pause/resume would not 
really be needed, etc. 

> Allow offline bulk imports
> --
>
> Key: ACCUMULO-4806
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4806
> Project: Accumulo
>  Issue Type: Sub-task
>  Components: master, tserver
>Reporter: Mark Owens
>Assignee: Michael Miller
>Priority: Major
> Fix For: 2.0.0
>
>
> Allowing offline bulk imports would be useful for some customers. Currently 
> these customers already take tables offline to set split points but then have 
> to bring them back online before starting the import.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports

2018-02-15 Thread Ed Coleman (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365467#comment-16365467
 ] 

Ed Coleman commented on ACCUMULO-4806:
--

[~kturner] - Performing the import online, or offline as being discussed does 
not really matter - but how would import errors work if the import was done 
off-line? Would you still end up with files in the error directory when as part 
of the bulk import - or after the table is brought online? {color:#33} What 
is the failure reporting mechanism?{color}

> Allow offline bulk imports
> --
>
> Key: ACCUMULO-4806
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4806
> Project: Accumulo
>  Issue Type: Sub-task
>  Components: master, tserver
>Reporter: Mark Owens
>Assignee: Michael Miller
>Priority: Major
> Fix For: 2.0.0
>
>
> Allowing offline bulk imports would be useful for some customers. Currently 
> these customers already take tables offline to set split points but then have 
> to bring them back online before starting the import.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports

2018-02-15 Thread Ed Coleman (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365452#comment-16365452
 ] 

Ed Coleman commented on ACCUMULO-4806:
--

[~milleruntime] - It would save time of the total import by doing 1 and 2 were 
done before the RFiles were ready - this also applies to the current steps 1 - 
4.

Part of this issue for me is that the tserver seems to be doing a lot of work 
to offline the tablets.

> Allow offline bulk imports
> --
>
> Key: ACCUMULO-4806
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4806
> Project: Accumulo
>  Issue Type: Sub-task
>  Components: master, tserver
>Reporter: Mark Owens
>Assignee: Michael Miller
>Priority: Major
> Fix For: 2.0.0
>
>
> Allowing offline bulk imports would be useful for some customers. Currently 
> these customers already take tables offline to set split points but then have 
> to bring them back online before starting the import.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports

2018-02-14 Thread Keith Turner (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364493#comment-16364493
 ] 

Keith Turner commented on ACCUMULO-4806:


For the possible workflow I mentioned I was thinking the offline bulk import 
could use the mapping file mentioned in ACCUMULO-4813. I think this could make 
that entire sequence of operations very fast.

> Allow offline bulk imports
> --
>
> Key: ACCUMULO-4806
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4806
> Project: Accumulo
>  Issue Type: Sub-task
>  Components: master, tserver
>Reporter: Mark Owens
>Assignee: Michael Miller
>Priority: Major
> Fix For: 2.0.0
>
>
> Allowing offline bulk imports would be useful for some customers. Currently 
> these customers already take tables offline to set split points but then have 
> to bring them back online before starting the import.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports

2018-02-14 Thread Michael Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364490#comment-16364490
 ] 

Michael Miller commented on ACCUMULO-4806:
--

Also, would it save time even if the 4 steps Keith mentioned above = total time 
current bulk import?  Say if you can complete Steps 1 and 2 ahead of time, 
before data arrives?

> Allow offline bulk imports
> --
>
> Key: ACCUMULO-4806
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4806
> Project: Accumulo
>  Issue Type: Sub-task
>  Components: master, tserver
>Reporter: Mark Owens
>Assignee: Michael Miller
>Priority: Major
> Fix For: 2.0.0
>
>
> Allowing offline bulk imports would be useful for some customers. Currently 
> these customers already take tables offline to set split points but then have 
> to bring them back online before starting the import.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports

2018-02-14 Thread Keith Turner (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364471#comment-16364471
 ] 

Keith Turner commented on ACCUMULO-4806:


[~etcoleman] if create table supported creating and offline table, would the 
following work flow be useful?
 * Create offline table
 * Add splits to offline table
 * Bulk import to offline table
 * Bring table online

> Allow offline bulk imports
> --
>
> Key: ACCUMULO-4806
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4806
> Project: Accumulo
>  Issue Type: Sub-task
>  Components: master, tserver
>Reporter: Mark Owens
>Assignee: Michael Miller
>Priority: Major
> Fix For: 2.0.0
>
>
> Allowing offline bulk imports would be useful for some customers. Currently 
> these customers already take tables offline to set split points but then have 
> to bring them back online before starting the import.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports

2018-02-09 Thread Ed Coleman (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359072#comment-16359072
 ] 

Ed Coleman commented on ACCUMULO-4806:
--

I am not aware of any issues. From the client's perspective, it would actually 
simplify operations. 

Adding the splits at table creation would make the operations to prepare a 
table for bulk import into a new / pre-split table simpler because it could be 
done in one client operation rather than the current four:  create table, add 
splits, off-line table, online table. Being able to add splits to an off-line 
table would require still require the same number of client operations: create 
table, off-line table, add splits, on-line table, just the order is different.

The currently complexity is manageable, so I would defer to any approach that 
is easier to implement and provides equivalent performance and processing 
requirements. 

For performance I'm measuring wall total wall clock time from when the table is 
created to when it is ready for the bulk import to begin.

For processing, I'm assuming that the processing required for adding splits at 
table creation or to an off-line table would be equivalent and one would not 
require significantly more processing power than the other - especially if the 
operations are performed by the master when it could impact other master 
operations.

So, I think from a client perspective adding the spilts at creation time would 
be preferred. 

 

> Allow offline bulk imports
> --
>
> Key: ACCUMULO-4806
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4806
> Project: Accumulo
>  Issue Type: Sub-task
>  Components: master, tserver
>Reporter: Mark Owens
>Assignee: Michael Miller
>Priority: Major
> Fix For: 2.0.0
>
>
> Allowing offline bulk imports would be useful for some customers. Currently 
> these customers already take tables offline to set split points but then have 
> to bring them back online before starting the import.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports

2018-02-09 Thread Michael Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359025#comment-16359025
 ] 

Michael Miller commented on ACCUMULO-4806:
--

Mark created one for splits as well: 
https://issues.apache.org/jira/browse/ACCUMULO-4808

> Allow offline bulk imports
> --
>
> Key: ACCUMULO-4806
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4806
> Project: Accumulo
>  Issue Type: Sub-task
>  Components: master, tserver
>Reporter: Mark Owens
>Assignee: Michael Miller
>Priority: Major
> Fix For: 2.0.0
>
>
> Allowing offline bulk imports would be useful for some customers. Currently 
> these customers already take tables offline to set split points but then have 
> to bring them back online before starting the import.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports

2018-02-09 Thread Keith Turner (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359022#comment-16359022
 ] 

Keith Turner commented on ACCUMULO-4806:


[~etcoleman] do you know if there is an issue for supplying splits at table 
creation time?  I have some ideas about how it could be implemented.

> Allow offline bulk imports
> --
>
> Key: ACCUMULO-4806
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4806
> Project: Accumulo
>  Issue Type: Sub-task
>  Components: master, tserver
>Reporter: Mark Owens
>Assignee: Michael Miller
>Priority: Major
> Fix For: 2.0.0
>
>
> Allowing offline bulk imports would be useful for some customers. Currently 
> these customers already take tables offline to set split points but then have 
> to bring them back online before starting the import.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports

2018-02-09 Thread Ed Coleman (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358580#comment-16358580
 ] 

Ed Coleman commented on ACCUMULO-4806:
--

It would go al long way if the splits could be added at table creation or when 
table is offline.  When the other API changes were made by Mark, I wondered if 
this task could also could be done at that time - but I believe that it was 
more complicated.

The delay is that when a table is created and then the splits added and then 
taken offline there is a period proportional to the number of splits as they 
are off-loaded from the tserver where they originally got assigned.  (The 
re-online with splits distributed across the cluster is quite fast)

If the splits could be added at table creation, or while the table is offline 
so that the delay for shedding the tablets could be avoided, then the need to 
perform the actual import offline would not be as necessary.

> Allow offline bulk imports
> --
>
> Key: ACCUMULO-4806
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4806
> Project: Accumulo
>  Issue Type: Sub-task
>  Components: master, tserver
>Reporter: Mark Owens
>Priority: Major
> Fix For: 2.0.0
>
>
> Allowing offline bulk imports would be useful for some customers. Currently 
> these customers already take tables offline to set split points but then have 
> to bring them back online before starting the import.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)