Re: [OT] 20180917-Need Apache SOLR support

2018-09-18 Thread Jan Høydahl
I guess you could do a version-independent backup with /export handler and store
docs in XML or JSON format. Or you could use streaming and store the entire 
index
as JSON tuples, which could then be ingested into another version.

But it is correct that the backup/restore feature of Solr is not primarily 
intended for archival
or moving a collection to a completely different version. It is primarily 
intended as a
much faster disaster recovery method than reindex from slow sources. But you 
COULD
also use it to quickly migrate from an old cluster to the next major version.

It would be cool to investigate an alternate backup command, which instructs 
each shard
leader to stream all documents to JSON inside the backup folder, in parallell. 
But you may
still get issues with the Zookeeper part if restoring to a very different 
version.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 18. sep. 2018 kl. 17:24 skrev Walter Underwood :
> 
> It isn’t very clear from that page, but the two backup methods make a copy
> of the indexes in a commit-aware way. That is all. One method copies them
> to a new server, the other to files in the data directory.
> 
> Database backups generally have a separate backup format which is 
> independent of the database version. For example, mysqldump generates
> a backup as SQL statements.
> 
> The Solr backup is version-locked, because it is just a copy of the index 
> files.
> People who are used to database backups might be very surprised when they
> could not load a Solr backup into a server with a different version or on a
> different architecture.
> 
> The only version-independent restore in Solr is to reload the data from the
> source repository.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Sep 18, 2018, at 8:15 AM, Christopher Schultz 
>>  wrote:
>> 
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>> 
>> Walter,
>> 
>> On 9/17/18 11:39, Walter Underwood wrote:
>>> Do not use Solr as a database. It was never designed to be a
>>> database. It is missing a lot of features that are normal in
>>> databases.
>>> 
>>> [...] * no real backups (Solr backup is a cold server, not a
>>> dump/load)
>> 
>> I'm just curious... if Solr has "no real backups", why is there a
>> complete client API for performing backups and restores?
>> 
>> https://lucene.apache.org/solr/guide/7_4/making-and-restoring-backups.ht
>> ml
>> 
>> Thanks,
>> - -chris
>> -BEGIN PGP SIGNATURE-
>> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>> 
>> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhFp8ACgkQHPApP6U8
>> pFgnhBAAre3Zb2mu++WVmY6rZlcc3uoRkDRva6iR602wA/w/EUabCmHEkO9maYEm
>> NoUREgBH9NtFPvYnjkEEL7/P/2hUErvRw0RfwsAo89ClYjjyMEH25+p5SNmudUmK
>> fKRSLRUyCbpE8ahKTPG44gRlki03uJJ2GA0r3vbTLvdqm1p5KO6sE4k/r3IYJ0QI
>> qZfUY4Un+LQ5vGMQ7qeGRcFhaAXVOaJmnLCRqGTS2hMTM1uM01TCblhOaeX5XHYD
>> Yra4m15Sr1H8p3S0CFsP8oqvDND0jEC4MxM9mQvHOvq9IwMreTSwACga35Wm6ItD
>> h1/Td9H/Puo8o9vQMaVfNcFD4TAqt+FkIHzQEb+FkQAMfbC9ZHsmBgvl8EUtPBq1
>> h2ODETEcD5SsmdfrP5OWUz+0OBhH7/HEgWRjHW9nSMzhPn4kYgpF/7VuFL8iy3re
>> /8TviTf446I859QNragWXACdARhCzMo8AoXIs/dC70CGDvxuKmEcI6tad9Zsxcf2
>> +yaFa3Fzddulaeao4juZVbRVJ9eewFOSawMXDc14TeL6t13CxzxFasHiYu0C5euV
>> XhKSWEHYj58ijS/KU4FMDCEWZhr1KWEKwfVp7hZ2CZZNW5kNPbv97otKvxB0cKyS
>> LTK6PtZoZbTWXFa8rT3yq28/x6gMULQeo0ZBZLTXEJKpfAT2vAU=
>> =Fh1S
>> -END PGP SIGNATURE-
> 



Re: [OT] 20180917-Need Apache SOLR support

2018-09-18 Thread Erick Erickson
The only hard-and-fast rule is that you must re-index from source when
you upgrade to Solr X+2. Solr (well, Lucene) tries very hard to
maintain one-major-version back-compatibility, so Solr 8 will function
with Solr 7 indexes but _not_ any index _ever touched_ by 6x.

That said, it's usually a good idea to re-index anyway when jumping a
major version (say Solr 7 -> Solr 8) if possible.

Best,
Erick
On Tue, Sep 18, 2018 at 11:22 AM Christopher Schultz
 wrote:
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Walter,
>
> On 9/18/18 11:24, Walter Underwood wrote:
> > It isn’t very clear from that page, but the two backup methods make
> > a copy of the indexes in a commit-aware way. That is all. One
> > method copies them to a new server, the other to files in the data
> > directory.
> >
> > Database backups generally have a separate backup format which is
> > independent of the database version. For example, mysqldump
> > generates a backup as SQL statements.
> >
> > The Solr backup is version-locked, because it is just a copy of the
> > index files. People who are used to database backups might be very
> > surprised when they could not load a Solr backup into a server with
> > a different version or on a different architecture.
> >
> > The only version-independent restore in Solr is to reload the data
> > from the source repository.
>
> Thanks for the explanation.
>
> We recently re-built from source and it took about 10 minutes. If we
> can get better performance for a restore starting with a "backup"
> (which is likely), we'll probably go ahead and do that, with the
> understanding that the ultimate fallback is reload-from-source.
>
> When upgrading to a new version of Solr, what are the rules for when
> you have to discard your whole index and reload from source? We have
> been in the 7.x line since we began development and testing and have
> not had any reason to reload from source so far. (Well, except when we
> had to make schema changes.)
>
> Thanks,
> - -chris
>
> >> On Sep 18, 2018, at 8:15 AM, Christopher Schultz
> >>  wrote:
> >>
> > Walter,
> >
> > On 9/17/18 11:39, Walter Underwood wrote:
>  Do not use Solr as a database. It was never designed to be a
>  database. It is missing a lot of features that are normal in
>  databases.
> 
>  [...] * no real backups (Solr backup is a cold server, not a
>  dump/load)
> >
> > I'm just curious... if Solr has "no real backups", why is there a
> > complete client API for performing backups and restores?
> >
> > https://lucene.apache.org/solr/guide/7_4/making-and-restoring-backups.
> ht
> >
> >
> ml
> >
> > Thanks, -chris
> >
> >
> -BEGIN PGP SIGNATURE-
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhQlkACgkQHPApP6U8
> pFgcyRAAm4/FeeGn3eGv4CwNVfc9GrsUYc4/YexdwRT7oFUgqTC2kYeegj/YAgm3
> ZwgfLDkDL0HR51i/pp4UG8MDTB5NFtp8Jg6+JSE4SutAA72N6vnwnC1Z/T52i0xG
> OqT0lFKeIL7Tt5c0FffbAMx5rgbFkzWHNWgFFqYFB0WZEzj4JM6rmAiDqLunRGPA
> xAZUnZCRMXhcVZT0bmmnSGlyU+JHL0ZQrJD/WX4DOJo2ZyAvP7pSYBEU+nTfyjzJ
> kE3rx1W9o269yc052FJTk5rRADuHIdirQQ/SrUN3O7Nn7Hqqi2/6sqyM34CF6wmX
> IPv9frb/WTvXQ3nsFYmQVB1jEBBr5S+9pztO3jOtUbGGKCjBpVGDcOXJVBwEDzPW
> yII5EjpjkoYwVB6shUI2nfaM/Y6r4aQLrZO6A5FFePhQTm6BGa/i2i1A1uLqfvHY
> WMmv/QMYqXZu7hXW6l5NKpO1AtSKTZBq8iXi9BiOXSHNSxo9mT9kPLu40Uh63Gyp
> EHI/SfAPWNwOj01pkbyV+siyhAWBVWpolN1SinnW3ZR16Yddd2lRmNxdfVCC32pL
> OfRxrChtZ736kvm4ELzmUAUjITxpZf7AFgsrB6zyTlPRn/jvnW7sRsIsOa4BHdGC
> e4oCzK7waITu6jam4Zz6e3efyxSDfT2YZ7811L098mody1n2g5k=
> =PaVE
> -END PGP SIGNATURE-


Re: [OT] 20180917-Need Apache SOLR support

2018-09-18 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Walter,

On 9/18/18 11:24, Walter Underwood wrote:
> It isn’t very clear from that page, but the two backup methods make
> a copy of the indexes in a commit-aware way. That is all. One
> method copies them to a new server, the other to files in the data
> directory.
> 
> Database backups generally have a separate backup format which is 
> independent of the database version. For example, mysqldump
> generates a backup as SQL statements.
> 
> The Solr backup is version-locked, because it is just a copy of the
> index files. People who are used to database backups might be very
> surprised when they could not load a Solr backup into a server with
> a different version or on a different architecture.
> 
> The only version-independent restore in Solr is to reload the data
> from the source repository.

Thanks for the explanation.

We recently re-built from source and it took about 10 minutes. If we
can get better performance for a restore starting with a "backup"
(which is likely), we'll probably go ahead and do that, with the
understanding that the ultimate fallback is reload-from-source.

When upgrading to a new version of Solr, what are the rules for when
you have to discard your whole index and reload from source? We have
been in the 7.x line since we began development and testing and have
not had any reason to reload from source so far. (Well, except when we
had to make schema changes.)

Thanks,
- -chris

>> On Sep 18, 2018, at 8:15 AM, Christopher Schultz
>>  wrote:
>> 
> Walter,
> 
> On 9/17/18 11:39, Walter Underwood wrote:
 Do not use Solr as a database. It was never designed to be a 
 database. It is missing a lot of features that are normal in 
 databases.
 
 [...] * no real backups (Solr backup is a cold server, not a 
 dump/load)
> 
> I'm just curious... if Solr has "no real backups", why is there a 
> complete client API for performing backups and restores?
> 
> https://lucene.apache.org/solr/guide/7_4/making-and-restoring-backups.
ht
>
> 
ml
> 
> Thanks, -chris
> 
> 
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhQlkACgkQHPApP6U8
pFgcyRAAm4/FeeGn3eGv4CwNVfc9GrsUYc4/YexdwRT7oFUgqTC2kYeegj/YAgm3
ZwgfLDkDL0HR51i/pp4UG8MDTB5NFtp8Jg6+JSE4SutAA72N6vnwnC1Z/T52i0xG
OqT0lFKeIL7Tt5c0FffbAMx5rgbFkzWHNWgFFqYFB0WZEzj4JM6rmAiDqLunRGPA
xAZUnZCRMXhcVZT0bmmnSGlyU+JHL0ZQrJD/WX4DOJo2ZyAvP7pSYBEU+nTfyjzJ
kE3rx1W9o269yc052FJTk5rRADuHIdirQQ/SrUN3O7Nn7Hqqi2/6sqyM34CF6wmX
IPv9frb/WTvXQ3nsFYmQVB1jEBBr5S+9pztO3jOtUbGGKCjBpVGDcOXJVBwEDzPW
yII5EjpjkoYwVB6shUI2nfaM/Y6r4aQLrZO6A5FFePhQTm6BGa/i2i1A1uLqfvHY
WMmv/QMYqXZu7hXW6l5NKpO1AtSKTZBq8iXi9BiOXSHNSxo9mT9kPLu40Uh63Gyp
EHI/SfAPWNwOj01pkbyV+siyhAWBVWpolN1SinnW3ZR16Yddd2lRmNxdfVCC32pL
OfRxrChtZ736kvm4ELzmUAUjITxpZf7AFgsrB6zyTlPRn/jvnW7sRsIsOa4BHdGC
e4oCzK7waITu6jam4Zz6e3efyxSDfT2YZ7811L098mody1n2g5k=
=PaVE
-END PGP SIGNATURE-


Re: [OT] 20180917-Need Apache SOLR support

2018-09-18 Thread Walter Underwood
It isn’t very clear from that page, but the two backup methods make a copy
of the indexes in a commit-aware way. That is all. One method copies them
to a new server, the other to files in the data directory.

Database backups generally have a separate backup format which is 
independent of the database version. For example, mysqldump generates
a backup as SQL statements.

The Solr backup is version-locked, because it is just a copy of the index files.
People who are used to database backups might be very surprised when they
could not load a Solr backup into a server with a different version or on a
different architecture.

The only version-independent restore in Solr is to reload the data from the
source repository.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 18, 2018, at 8:15 AM, Christopher Schultz 
>  wrote:
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
> 
> Walter,
> 
> On 9/17/18 11:39, Walter Underwood wrote:
>> Do not use Solr as a database. It was never designed to be a
>> database. It is missing a lot of features that are normal in
>> databases.
>> 
>> [...] * no real backups (Solr backup is a cold server, not a
>> dump/load)
> 
> I'm just curious... if Solr has "no real backups", why is there a
> complete client API for performing backups and restores?
> 
> https://lucene.apache.org/solr/guide/7_4/making-and-restoring-backups.ht
> ml
> 
> Thanks,
> - -chris
> -BEGIN PGP SIGNATURE-
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhFp8ACgkQHPApP6U8
> pFgnhBAAre3Zb2mu++WVmY6rZlcc3uoRkDRva6iR602wA/w/EUabCmHEkO9maYEm
> NoUREgBH9NtFPvYnjkEEL7/P/2hUErvRw0RfwsAo89ClYjjyMEH25+p5SNmudUmK
> fKRSLRUyCbpE8ahKTPG44gRlki03uJJ2GA0r3vbTLvdqm1p5KO6sE4k/r3IYJ0QI
> qZfUY4Un+LQ5vGMQ7qeGRcFhaAXVOaJmnLCRqGTS2hMTM1uM01TCblhOaeX5XHYD
> Yra4m15Sr1H8p3S0CFsP8oqvDND0jEC4MxM9mQvHOvq9IwMreTSwACga35Wm6ItD
> h1/Td9H/Puo8o9vQMaVfNcFD4TAqt+FkIHzQEb+FkQAMfbC9ZHsmBgvl8EUtPBq1
> h2ODETEcD5SsmdfrP5OWUz+0OBhH7/HEgWRjHW9nSMzhPn4kYgpF/7VuFL8iy3re
> /8TviTf446I859QNragWXACdARhCzMo8AoXIs/dC70CGDvxuKmEcI6tad9Zsxcf2
> +yaFa3Fzddulaeao4juZVbRVJ9eewFOSawMXDc14TeL6t13CxzxFasHiYu0C5euV
> XhKSWEHYj58ijS/KU4FMDCEWZhr1KWEKwfVp7hZ2CZZNW5kNPbv97otKvxB0cKyS
> LTK6PtZoZbTWXFa8rT3yq28/x6gMULQeo0ZBZLTXEJKpfAT2vAU=
> =Fh1S
> -END PGP SIGNATURE-



Re: [OT] 20180917-Need Apache SOLR support

2018-09-18 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Walter,

On 9/17/18 11:39, Walter Underwood wrote:
> Do not use Solr as a database. It was never designed to be a
> database. It is missing a lot of features that are normal in
> databases.
> 
> [...] * no real backups (Solr backup is a cold server, not a
> dump/load)

I'm just curious... if Solr has "no real backups", why is there a
complete client API for performing backups and restores?

https://lucene.apache.org/solr/guide/7_4/making-and-restoring-backups.ht
ml

Thanks,
- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhFp8ACgkQHPApP6U8
pFgnhBAAre3Zb2mu++WVmY6rZlcc3uoRkDRva6iR602wA/w/EUabCmHEkO9maYEm
NoUREgBH9NtFPvYnjkEEL7/P/2hUErvRw0RfwsAo89ClYjjyMEH25+p5SNmudUmK
fKRSLRUyCbpE8ahKTPG44gRlki03uJJ2GA0r3vbTLvdqm1p5KO6sE4k/r3IYJ0QI
qZfUY4Un+LQ5vGMQ7qeGRcFhaAXVOaJmnLCRqGTS2hMTM1uM01TCblhOaeX5XHYD
Yra4m15Sr1H8p3S0CFsP8oqvDND0jEC4MxM9mQvHOvq9IwMreTSwACga35Wm6ItD
h1/Td9H/Puo8o9vQMaVfNcFD4TAqt+FkIHzQEb+FkQAMfbC9ZHsmBgvl8EUtPBq1
h2ODETEcD5SsmdfrP5OWUz+0OBhH7/HEgWRjHW9nSMzhPn4kYgpF/7VuFL8iy3re
/8TviTf446I859QNragWXACdARhCzMo8AoXIs/dC70CGDvxuKmEcI6tad9Zsxcf2
+yaFa3Fzddulaeao4juZVbRVJ9eewFOSawMXDc14TeL6t13CxzxFasHiYu0C5euV
XhKSWEHYj58ijS/KU4FMDCEWZhr1KWEKwfVp7hZ2CZZNW5kNPbv97otKvxB0cKyS
LTK6PtZoZbTWXFa8rT3yq28/x6gMULQeo0ZBZLTXEJKpfAT2vAU=
=Fh1S
-END PGP SIGNATURE-