[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports
[ https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821860#comment-16821860 ] Michael Miller commented on ACCUMULO-4806: -- I believe this is completed. Any follow on work we can open up a github issue. > Allow offline bulk imports > -- > > Key: ACCUMULO-4806 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4806 > Project: Accumulo > Issue Type: Sub-task > Components: master, tserver >Reporter: Mark Owens >Assignee: Michael Miller >Priority: Major > Fix For: 2.0.0 > > > Allowing offline bulk imports would be useful for some customers. Currently > these customers already take tables offline to set split points but then have > to bring them back online before starting the import. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports
[ https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365891#comment-16365891 ] Mark Owens commented on ACCUMULO-4806: -- [~etcoleman] There is some discussion on modifying the way bulk imports work in general. The proposed change is described in ticket [ACCUMULO-4813|https://issues.apache.org/jira/browse/ACCUMULO-4813]. Can you read over that ticket and provide some thoughts from your perspective? This would provide the users control in how bulk imports are done. It is thought that if this was pursued it would negate the need for some of the other recently created bulk import sub-tasks. If the bulk import process time could be drop to under a minute, for example, the need for pause/resume would not really be needed, etc. > Allow offline bulk imports > -- > > Key: ACCUMULO-4806 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4806 > Project: Accumulo > Issue Type: Sub-task > Components: master, tserver >Reporter: Mark Owens >Assignee: Michael Miller >Priority: Major > Fix For: 2.0.0 > > > Allowing offline bulk imports would be useful for some customers. Currently > these customers already take tables offline to set split points but then have > to bring them back online before starting the import. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports
[ https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365467#comment-16365467 ] Ed Coleman commented on ACCUMULO-4806: -- [~kturner] - Performing the import online, or offline as being discussed does not really matter - but how would import errors work if the import was done off-line? Would you still end up with files in the error directory when as part of the bulk import - or after the table is brought online? {color:#33} What is the failure reporting mechanism?{color} > Allow offline bulk imports > -- > > Key: ACCUMULO-4806 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4806 > Project: Accumulo > Issue Type: Sub-task > Components: master, tserver >Reporter: Mark Owens >Assignee: Michael Miller >Priority: Major > Fix For: 2.0.0 > > > Allowing offline bulk imports would be useful for some customers. Currently > these customers already take tables offline to set split points but then have > to bring them back online before starting the import. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports
[ https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365452#comment-16365452 ] Ed Coleman commented on ACCUMULO-4806: -- [~milleruntime] - It would save time of the total import by doing 1 and 2 were done before the RFiles were ready - this also applies to the current steps 1 - 4. Part of this issue for me is that the tserver seems to be doing a lot of work to offline the tablets. > Allow offline bulk imports > -- > > Key: ACCUMULO-4806 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4806 > Project: Accumulo > Issue Type: Sub-task > Components: master, tserver >Reporter: Mark Owens >Assignee: Michael Miller >Priority: Major > Fix For: 2.0.0 > > > Allowing offline bulk imports would be useful for some customers. Currently > these customers already take tables offline to set split points but then have > to bring them back online before starting the import. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports
[ https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364493#comment-16364493 ] Keith Turner commented on ACCUMULO-4806: For the possible workflow I mentioned I was thinking the offline bulk import could use the mapping file mentioned in ACCUMULO-4813. I think this could make that entire sequence of operations very fast. > Allow offline bulk imports > -- > > Key: ACCUMULO-4806 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4806 > Project: Accumulo > Issue Type: Sub-task > Components: master, tserver >Reporter: Mark Owens >Assignee: Michael Miller >Priority: Major > Fix For: 2.0.0 > > > Allowing offline bulk imports would be useful for some customers. Currently > these customers already take tables offline to set split points but then have > to bring them back online before starting the import. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports
[ https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364490#comment-16364490 ] Michael Miller commented on ACCUMULO-4806: -- Also, would it save time even if the 4 steps Keith mentioned above = total time current bulk import? Say if you can complete Steps 1 and 2 ahead of time, before data arrives? > Allow offline bulk imports > -- > > Key: ACCUMULO-4806 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4806 > Project: Accumulo > Issue Type: Sub-task > Components: master, tserver >Reporter: Mark Owens >Assignee: Michael Miller >Priority: Major > Fix For: 2.0.0 > > > Allowing offline bulk imports would be useful for some customers. Currently > these customers already take tables offline to set split points but then have > to bring them back online before starting the import. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports
[ https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364471#comment-16364471 ] Keith Turner commented on ACCUMULO-4806: [~etcoleman] if create table supported creating and offline table, would the following work flow be useful? * Create offline table * Add splits to offline table * Bulk import to offline table * Bring table online > Allow offline bulk imports > -- > > Key: ACCUMULO-4806 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4806 > Project: Accumulo > Issue Type: Sub-task > Components: master, tserver >Reporter: Mark Owens >Assignee: Michael Miller >Priority: Major > Fix For: 2.0.0 > > > Allowing offline bulk imports would be useful for some customers. Currently > these customers already take tables offline to set split points but then have > to bring them back online before starting the import. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports
[ https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359072#comment-16359072 ] Ed Coleman commented on ACCUMULO-4806: -- I am not aware of any issues. From the client's perspective, it would actually simplify operations. Adding the splits at table creation would make the operations to prepare a table for bulk import into a new / pre-split table simpler because it could be done in one client operation rather than the current four: create table, add splits, off-line table, online table. Being able to add splits to an off-line table would require still require the same number of client operations: create table, off-line table, add splits, on-line table, just the order is different. The currently complexity is manageable, so I would defer to any approach that is easier to implement and provides equivalent performance and processing requirements. For performance I'm measuring wall total wall clock time from when the table is created to when it is ready for the bulk import to begin. For processing, I'm assuming that the processing required for adding splits at table creation or to an off-line table would be equivalent and one would not require significantly more processing power than the other - especially if the operations are performed by the master when it could impact other master operations. So, I think from a client perspective adding the spilts at creation time would be preferred. > Allow offline bulk imports > -- > > Key: ACCUMULO-4806 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4806 > Project: Accumulo > Issue Type: Sub-task > Components: master, tserver >Reporter: Mark Owens >Assignee: Michael Miller >Priority: Major > Fix For: 2.0.0 > > > Allowing offline bulk imports would be useful for some customers. Currently > these customers already take tables offline to set split points but then have > to bring them back online before starting the import. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports
[ https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359025#comment-16359025 ] Michael Miller commented on ACCUMULO-4806: -- Mark created one for splits as well: https://issues.apache.org/jira/browse/ACCUMULO-4808 > Allow offline bulk imports > -- > > Key: ACCUMULO-4806 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4806 > Project: Accumulo > Issue Type: Sub-task > Components: master, tserver >Reporter: Mark Owens >Assignee: Michael Miller >Priority: Major > Fix For: 2.0.0 > > > Allowing offline bulk imports would be useful for some customers. Currently > these customers already take tables offline to set split points but then have > to bring them back online before starting the import. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports
[ https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359022#comment-16359022 ] Keith Turner commented on ACCUMULO-4806: [~etcoleman] do you know if there is an issue for supplying splits at table creation time? I have some ideas about how it could be implemented. > Allow offline bulk imports > -- > > Key: ACCUMULO-4806 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4806 > Project: Accumulo > Issue Type: Sub-task > Components: master, tserver >Reporter: Mark Owens >Assignee: Michael Miller >Priority: Major > Fix For: 2.0.0 > > > Allowing offline bulk imports would be useful for some customers. Currently > these customers already take tables offline to set split points but then have > to bring them back online before starting the import. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ACCUMULO-4806) Allow offline bulk imports
[ https://issues.apache.org/jira/browse/ACCUMULO-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358580#comment-16358580 ] Ed Coleman commented on ACCUMULO-4806: -- It would go al long way if the splits could be added at table creation or when table is offline. When the other API changes were made by Mark, I wondered if this task could also could be done at that time - but I believe that it was more complicated. The delay is that when a table is created and then the splits added and then taken offline there is a period proportional to the number of splits as they are off-loaded from the tserver where they originally got assigned. (The re-online with splits distributed across the cluster is quite fast) If the splits could be added at table creation, or while the table is offline so that the delay for shedding the tablets could be avoided, then the need to perform the actual import offline would not be as necessary. > Allow offline bulk imports > -- > > Key: ACCUMULO-4806 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4806 > Project: Accumulo > Issue Type: Sub-task > Components: master, tserver >Reporter: Mark Owens >Priority: Major > Fix For: 2.0.0 > > > Allowing offline bulk imports would be useful for some customers. Currently > these customers already take tables offline to set split points but then have > to bring them back online before starting the import. -- This message was sent by Atlassian JIRA (v7.6.3#76005)