Re: [zfs-discuss] HELP! RPool problem

2013-02-16 Thread Ian Collins

Sašo Kiselkov wrote:

On 02/16/2013 09:49 PM, John D Groenveld wrote:

Boot with kernel debugger so you can see the panic.

Sadly, though, without access to the source code, all he do can at that
point is log a support ticket with Oracle (assuming he has paid his
support fees) and hope it will get picked up by somebody there. People
on this list have few, if any ways of helping out.


If he can boot from a recent install media and import the pool, that's a 
pretty good indicator that the problem has been fixed. He can then 
upgrade the what ever he booted with (which could be OI or Solaris11.1) 
and recover his data.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! RPool problem

2013-02-16 Thread Jim Klimov

On 2013-02-16 21:49, John D Groenveld wrote:

By the way, whatever the error message is when booting, it disapears so
quickly I can't read it, so I am only guessing that this is the reason.


Boot with kernel debugger so you can see the panic.

And that would be so:
1) In the boot loader (GRUB) edit the boot options (press "e",
   select "kernel" line, press "e" again), and add "-kd" to the
   kernel bootup. Maybe also "-v" to add verbosity.

2) Press enter to save the change and "b" to boot

3) The kmdb prompt should pop up; enter ":c" to continue execution
   The bootup should start, throw the kernel panic and pause.
   It is likely that there would be so much info that it doesn't
   fit on screen - I can only suggest a serial console in this case.

   However, the end of dump info should point you in the right
   direction. For example, an error in "mount_vfs_root" is popular,
   and usually means either corrupt media or simply unexpected device
   name for the root pool (i.e. disk plugged on a different port, or
   BIOS changes between SATA-IDE modes, etc.)

The device name changes should go away if you can boot from anything
that can import your rpool (livecd, installer cd, failsafe boot image)
and just "zpool import -f rpool; zpool export rpool" - this should
clear the dependency on exact device names, and next bootup should
work.

And yes, I think it is a bug for such a fixable problem to behave so
inconveniently - the official docs go as far as to suggest an OS
reinstallation in this case.

//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! RPool problem

2013-02-16 Thread James C. McPherson

On 17/02/13 08:48 AM, Sašo Kiselkov wrote:

On 02/16/2013 10:47 PM, James C. McPherson wrote:

...

Whether that message winds up being something you need
to talk with a Oracle about is entirely different.


He got a kernel panic on a completely legitimate operation (booting with
one half of the root mirror faulted). There's a good chance that the
only thing he'll see is something like BAD TRAP and a stack trace.
Without source, that's where the investigation ends.


There is significant information provided in a panic message
which does NOT require that you go and ask Oracle for help.

As I pointed out, too, there is a non-Oracle source repo which
does contain the code which went into the release and build
which the OP is running. He's running Solaris 11 Express, which
we published/delivered as build snv_151b. One would hope that
there are sufficient hints in the previous 2 sentences to enable
debugging if that is required.



The OP mentioned that he was running S11 Express, for
which, iirc, you can dig through source on a non-Oracle
site and investigate.


And once he's found the problem, what then? Can he build a new ZFS
kernel module? Can he submit a patch?


You're assuming that he's found a bug which is unfixed,
and not related to failed hardware. Big assumption.



Really, though, just adding

-k

to the kernel$ line in the grub menu prior to booting
should be enough for him to make significant progress.


If by "significant progress" you mean sending a stack trace to Oracle,
then yes.


I think you are insulting the OP by assuming that he has
insufficient understanding of how to use a search engine.


Look I'm not accusing you or anybody else for not trying to help - there
are some wonderful people around here who both care deeply for their
users and are proud of their work. I fully applaud that stance.
All I'm doing is just pointing out the facts of the matter - take from
that what you will.


Your opinion is no doubt coloured by the recent announcement
re opensolaris.org.

I have corresponded privately with the OP on this matter. I
will not respond further to this thread.


James C. McPherson
--
Oracle
Systems / Solaris / Core
http://www.jmcpdotcom.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! RPool problem

2013-02-16 Thread Sašo Kiselkov
On 02/16/2013 10:47 PM, James C. McPherson wrote:
> On 17/02/13 06:54 AM, Sašo Kiselkov wrote:
>> On 02/16/2013 09:49 PM, John D Groenveld wrote:
>>> Boot with kernel debugger so you can see the panic.
>>
>> Sadly, though, without access to the source code, all he do can at that
>> point is log a support ticket with Oracle (assuming he has paid his
>> support fees) and hope it will get picked up by somebody there. People
>> on this list have few, if any ways of helping out.
> 
> You're missing the point. Booting with kmdb enabled
> is The Way(tm) to get anything remotely resembling
> a paused screen so you can see what the message is.
> 
> Whether that message winds up being something you need
> to talk with a Oracle about is entirely different.

He got a kernel panic on a completely legitimate operation (booting with
one half of the root mirror faulted). There's a good chance that the
only thing he'll see is something like BAD TRAP and a stack trace.
Without source, that's where the investigation ends.

> The OP mentioned that he was running S11 Express, for
> which, iirc, you can dig through source on a non-Oracle
> site and investigate.

And once he's found the problem, what then? Can he build a new ZFS
kernel module? Can he submit a patch?

> Really, though, just adding
> 
> -k
> 
> to the kernel$ line in the grub menu prior to booting
> should be enough for him to make significant progress.

If by "significant progress" you mean sending a stack trace to Oracle,
then yes.

Look I'm not accusing you or anybody else for not trying to help - there
are some wonderful people around here who both care deeply for their
users and are proud of their work. I fully applaud that stance.
All I'm doing is just pointing out the facts of the matter - take from
that what you will.

Cheers,
--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! RPool problem

2013-02-16 Thread James C. McPherson

On 17/02/13 06:54 AM, Sašo Kiselkov wrote:

On 02/16/2013 09:49 PM, John D Groenveld wrote:

Boot with kernel debugger so you can see the panic.


Sadly, though, without access to the source code, all he do can at that
point is log a support ticket with Oracle (assuming he has paid his
support fees) and hope it will get picked up by somebody there. People
on this list have few, if any ways of helping out.


You're missing the point. Booting with kmdb enabled
is The Way(tm) to get anything remotely resembling
a paused screen so you can see what the message is.

Whether that message winds up being something you need
to talk with a Oracle about is entirely different.

The OP mentioned that he was running S11 Express, for
which, iirc, you can dig through source on a non-Oracle
site and investigate.

Really, though, just adding

-k

to the kernel$ line in the grub menu prior to booting
should be enough for him to make significant progress.



James C. McPherson
--
Oracle
Systems / Solaris / Core
http://www.jmcpdotcom.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! RPool problem

2013-02-16 Thread Sašo Kiselkov
On 02/16/2013 09:49 PM, John D Groenveld wrote:
> Boot with kernel debugger so you can see the panic.

Sadly, though, without access to the source code, all he do can at that
point is log a support ticket with Oracle (assuming he has paid his
support fees) and hope it will get picked up by somebody there. People
on this list have few, if any ways of helping out.

Cheers,
--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! RPool problem

2013-02-16 Thread John D Groenveld
In message 
, Karl Wagner writes:
>The SSD was the first boot drive, and every time it tried to boot it
>panicked and rebooted, ending up in a loop. I tried to change to the second
>rpool drive, but either I forgot to install grub on it or it has become
>corrupted (probably the first, I can be that stupid at times).
>
>Can anyone give me any advice on how to get this system back? Can I trick
>grub, installed on the SSD, to boot from the HDD's rpool mirror? Is
>something more sinister going on?

Remove the broken drive, boot installation media, import the
mirror drive.
If it imports, you will be able to installgrub(1M).

>By the way, whatever the error message is when booting, it disapears so
>quickly I can't read it, so I am only guessing that this is the reason.

Boot with kernel debugger so you can see the panic.

John
groenv...@acm.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] HELP! RPool problem

2013-02-16 Thread Karl Wagner
I have a small problem.

I have a development fileserver box running Solaris 11 Express. The Rpool
is mirrored between an SSD and a hard drive. Today, the SSD deveoped a
fault for some reason. While trying to diagnose the problem, the system
panicked and rebooted.

The SSD was the first boot drive, and every time it tried to boot it
panicked and rebooted, ending up in a loop. I tried to change to the second
rpool drive, but either I forgot to install grub on it or it has become
corrupted (probably the first, I can be that stupid at times).

Can anyone give me any advice on how to get this system back? Can I trick
grub, installed on the SSD, to boot from the HDD's rpool mirror? Is
something more sinister going on?

By the way, whatever the error message is when booting, it disapears so
quickly I can't read it, so I am only guessing that this is the reason.

PLEASE HELP!

Thanks
Karl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss