Issue 1183 in ganeti: gnt-backup export failure

2016-06-24 Thread ganeti

Status: New
Owner: 

New issue 1183 by cr...@rubensteintech.com: gnt-backup export failure
https://code.google.com/p/ganeti/issues/detail?id=1183

What software version are you running? Please provide the output of "gnt-
cluster --version", "gnt-cluster version", and "hspace --version".
# gnt-cluster --version
gnt-cluster (ganeti v2.14.2-177-g003cd9a) 2.15.2

# gnt-cluster --version
gnt-cluster (ganeti v2.14.2-177-g003cd9a) 2.15.2
[root@ocean ~]# gnt-cluster version
Software version: 2.15.2
Internode protocol: 215
Configuration format: 215
OS api version: 20
Export interface: 0
VCS version: (ganeti) version v2.14.2-177-g003cd9a

# hspace --version
hspace (ganeti) version v2.14.2-177-g003cd9a
compiled with ghc 7.6
running on linux x86_64

What distribution are you using?
# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)

What steps will reproduce the problem?
1. gnt-backup export -n node1 instance1
2.
3.

What is the expected output? What do you see instead?
Backup complete message. Instead:# gnt-backup export -d -n node1 instance1
2016-06-24 17:05:01,989: gnt-backup export pid=179732 cli:1218 DEBUG  
Command line: gnt-backup export -d -n node1 instance1
Fri Jun 24 17:05:02 2016 Shutting down instance  
instance1.intranet.domain.com
Fri Jun 24 17:05:07 2016 Creating a snapshot of disk/0 on node  
node1.intranet.domain.com

Fri Jun 24 17:05:07 2016 Starting instance instance1.intranet.domain.com
Fri Jun 24 17:05:08 2016 Exporting snapshot/0 from  
node1.intranet.domain.com to node1.intranet.domain.com

Fri Jun 24 17:05:11 2016 snapshot/0 is now listening, starting export
Fri Jun 24 17:05:13 2016 snapshot/0 finished receiving data
Fri Jun 24 17:05:13 2016  - WARNING:  
export 'export-disk0-2016-06-24_17_05_12-YArFHA' on  
node1.intranet.domain.com failed: Exited with status 1
Fri Jun 24 17:05:13 2016 snapshot/0 failed to send data: Exited with status  
1 (recent output: )
Fri Jun 24 17:05:13 2016 Removing snapshot of disk/0 on node  
node1.intranet.domain.com
Fri Jun 24 17:05:14 2016  - WARNING: Some disk exports have failed; there  
may be leftover data for instance instance1.intranet.domain.com on node  
node1.intranet.domain.com
2016-06-24 17:05:14,120: gnt-backup export pid=179732 cli:1225 ERROR Error  
during command processing

Traceback (most recent call last):
  File "/usr/share/ganeti/2.15/ganeti/cli.py", line 1221, in GenericMain
result = func(options, args)
  File "/usr/share/ganeti/2.15/ganeti/client/gnt_backup.py", line 116, in  
ExportInstance

SubmitOrSend(op, opts)
  File "/usr/share/ganeti/2.15/ganeti/cli.py", line 1011, in SubmitOrSend
return SubmitOpCode(op, cl=cl, feedback_fn=feedback_fn, opts=opts)
  File "/usr/share/ganeti/2.15/ganeti/cli.py", line 976, in SubmitOpCode
reporter=reporter)
  File "/usr/share/ganeti/2.15/ganeti/cli.py", line 955, in PollJob
return GenericPollJob(job_id, _LuxiJobPollCb(cl), reporter)
  File "/usr/share/ganeti/2.15/ganeti/cli.py", line 777, in GenericPollJob
errors.MaybeRaise(msg)
  File "/usr/share/ganeti/2.15/ganeti/errors.py", line 531, in MaybeRaise
raise errcls(*args)
OpExecError: Export failed, errors in export finalization, disk export:  
disk(s) 0

Failure: command execution error:
Export failed, errors in export finalization, disk export: disk(s) 0


Please provide any additional information below.
I would be happy to test anything else to determine what the cause may be.

--
You received this message because this project is configured to send all  
issue notifications to this address.

You may adjust your notification preferences at:
https://code.google.com/hosting/settings


Re: [PATCH master 0/6] New balancing options implementation

2016-06-24 Thread Oleg Ponomarev
Hi Iustin,

> I'll look at the patches, but if I read correctly—these are currently
stored as tags. Would it make more sense to have
> them as proper values in the objects, so that (in the future) they can be
used by other parts of the code? Just a thought.

Do you have any ideas about how network bandwidth might be used in Ganeti
itself?
At my first glance, this information might be useful in HTools only. And in
this case, node tags is the common way to pass the information. It's the
same mechanism as used in HTools to obtain location, migration, desired
location and some other information.

Sincerely,
Oleg

On Fri, Jun 24, 2016 at 3:17 PM Iustin Pop  wrote:

> On 23 June 2016 at 18:32, Даниил Лещёв  wrote:
>
>>
>>> I would slightly prefer if we discuss it over plain email (without
>>> patches), to see what you think about how complex the network model needs
>>> to be, and whether a static "time X" vs. semi-dynamic (based on the
>>> instance disk size) approach is best.
>>>
>>> Maybe there was some more information back at the start of the project?
>>> (I only started watching the mailing list again recently).
>>>
>>> The initial plan was to implement "static" solutions, based on instance
>> disk size and then make it "dynamic" by using information about network
>> speed from data collectors.
>>
>
> Ack.
>
> At the moment, we have "semi-dynamic" solution, I think. The new tags may
>> specify network speed in cluster (and between different parts of cluster).
>>
>
> I'll look at the patches, but if I read correctly—these are currently
> stored as tags. Would it make more sense to have them as proper values in
> the objects, so that (in the future) they can be used by other parts of the
> code? Just a thought.
>
> I am assuming that this speed remains constant since the network usually
>> configured once and locally (for example in server rack).
>>
>
> That makes sense.
>
>
>> I think, with such assumption, the network speed stays almost constant
>> and the time estimations for balancing solutions become predictable.
>>
>> I suggest to use the new options for discarding solutions, that takes
>> long time and slightly changes the state of the cluster.
>> In my mind the time to perform disk replication is directly depends on
>> the network bandwidth.
>>
>
> Hmm, depends. On a gigabyte or 10G network and with mechanical harddrives,
> the time will depend more on disk load.
>
> thanks,
> iustin
>


Re: [PATCH master 0/6] New balancing options implementation

2016-06-24 Thread 'Iustin Pop' via ganeti-devel
On 23 June 2016 at 18:32, Даниил Лещёв  wrote:

>
>> I would slightly prefer if we discuss it over plain email (without
>> patches), to see what you think about how complex the network model needs
>> to be, and whether a static "time X" vs. semi-dynamic (based on the
>> instance disk size) approach is best.
>>
>> Maybe there was some more information back at the start of the project?
>> (I only started watching the mailing list again recently).
>>
>> The initial plan was to implement "static" solutions, based on instance
> disk size and then make it "dynamic" by using information about network
> speed from data collectors.
>

Ack.

At the moment, we have "semi-dynamic" solution, I think. The new tags may
> specify network speed in cluster (and between different parts of cluster).
>

I'll look at the patches, but if I read correctly—these are currently
stored as tags. Would it make more sense to have them as proper values in
the objects, so that (in the future) they can be used by other parts of the
code? Just a thought.

I am assuming that this speed remains constant since the network usually
> configured once and locally (for example in server rack).
>

That makes sense.


> I think, with such assumption, the network speed stays almost constant and
> the time estimations for balancing solutions become predictable.
>
> I suggest to use the new options for discarding solutions, that takes long
> time and slightly changes the state of the cluster.
> In my mind the time to perform disk replication is directly depends on the
> network bandwidth.
>

Hmm, depends. On a gigabyte or 10G network and with mechanical harddrives,
the time will depend more on disk load.

thanks,
iustin