date:20180918

Sounds like your Solr was restarted as a SolrCloud, maybe by an
automated script or an init service?

If you created a core in a standalone mode and then restart the same
configuration in a SolrCloud mode, it would know that you have those
collections/cores, but will not be able to find any configuration
files (because it will expect them in ZooKeeper, not on disk). So,
that would explain the error.

I would focus on the restart point, maybe check the logs (in
server/logs) and see if there are hints there.

Regards,
   Alex.
P.s. Unless you are able to see the SolrCloud from one computer and
Solr from another at the same time/back-and-forth. That would indicate
some sort of proxy/routing and then I would look at the Overview page
system variables to confirm the directories/options. But that's kind
of a distant second possibility.

On 18 September 2018 at 14:23, Gu, Steve (CDC/OD/OADS) (CTR)
 wrote:
> I have set up my solr as a standalone service and the its url is 
> http://solr.server:8983/solr.  I opened 8983 on  solr.server to anyone, and 
> solr can be accessed from laptops/desktops.  But when I tried to access the 
> solr from some servers, I got the error of SolrCore Initialization Failures.  
> The left nav on the page is shown but indicates that the solr is set up as 
> SolrCloud, which is not.
>
> I am really confused about this and have no idea how to tackle this problem.  
> Has anyone ever had a similar issue?  Or any idea why this is happening?
>
> Thanks
> Steve
>

Re: Reason Why Query Does Not Work

Have a look at what debug shows in the parsed query. I think every
bracket is quite significant actually and you are generating a
different type of clause.

Also, have you thought about putting those individual clauses into
'fq' instead of jointly into 'q'? This may give you faster search too,
as Solr will not have to worry about ranking.

Regards,
   Alex.

On 18 September 2018 at 14:38, Antelmo Aguilar  wrote:
> Hi,
>
> I am doing some date queries and I was wondering if there is some way of
> getting this query to work.
>
> ( ( !{!field f=collection_date_range op=Within v='[2000-01-01 TO
> 2018-09-18]'} AND !{!field f=collection_date_range op=Within v='[1960-01-01
> TO 1998-09-18]'} ) AND collection_season:([1999-05 TO 1999-05]) )
>
> I understand that I could just not do NOT queries and instead search for
> 1998-09-18 TO 2000-01-01, but doing NOT queries gives me more results (e.g
> records that do not have collection_date_range defined).
>
> If I remove the parenthesis enclosing the NOT queries, it works.  Without
> the parenthesis the query does not return results though.  So the query
> below, does work.
>
> ( !{!field f=collection_date_range op=Within v='[2000-01-01 TO
> 2018-09-18]'} AND !{!field f=collection_date_range op=Within v='[1960-01-01
> TO 1998-09-18]'} AND collection_season:([1999-05 TO 1999-05]) )
>
> Any insight would be appreciated.  I really do not see the reason why the
> parenthesis enclosing the NOT queries would cause it to not return results.
>
> Best,
> Antelmo

RE: weird error for accessing solr

2018-09-18 Thread Gu, Steve (CDC/OD/OADS) (CTR)

No the solr was not restarted as SolrCloud.  We see solr from one computer and 
all cores are available for query, but from another computer, it shows the 
admin page as solrcloud with errors on the page.  All the links on the left nav 
 do not work either.

-Original Message-
From: Alexandre Rafalovitch  
Sent: Tuesday, September 18, 2018 2:39 PM
To: solr-user 
Subject: Re: weird error for accessing solr

Sounds like your Solr was restarted as a SolrCloud, maybe by an automated 
script or an init service?

If you created a core in a standalone mode and then restart the same 
configuration in a SolrCloud mode, it would know that you have those 
collections/cores, but will not be able to find any configuration files 
(because it will expect them in ZooKeeper, not on disk). So, that would explain 
the error.

I would focus on the restart point, maybe check the logs (in
server/logs) and see if there are hints there.

Regards,
   Alex.
P.s. Unless you are able to see the SolrCloud from one computer and Solr from 
another at the same time/back-and-forth. That would indicate some sort of 
proxy/routing and then I would look at the Overview page system variables to 
confirm the directories/options. But that's kind of a distant second 
possibility.

On 18 September 2018 at 14:23, Gu, Steve (CDC/OD/OADS) (CTR)  
wrote:
> I have set up my solr as a standalone service and the its url is 
> http://solr.server:8983/solr.  I opened 8983 on  solr.server to anyone, and 
> solr can be accessed from laptops/desktops.  But when I tried to access the 
> solr from some servers, I got the error of SolrCore Initialization Failures.  
> The left nav on the page is shown but indicates that the solr is set up as 
> SolrCloud, which is not.
>
> I am really confused about this and have no idea how to tackle this problem.  
> Has anyone ever had a similar issue?  Or any idea why this is happening?
>
> Thanks
> Steve
>

Re: Reason Why Query Does Not Work

Also, Solr does _not_ implement strict Boolean logic, although with
appropriate parentheses you can get it to look like Boolean logic.
See: https://lucidworks.com/2011/12/28/why-not-and-or-and-not/.

Additionally, for _some_ clauses a pure-not query is translated into
*:* -pure_not_query which is helpful, but occasionally confusing.

Best,
Erick
On Tue, Sep 18, 2018 at 11:43 AM Alexandre Rafalovitch
 wrote:
>
> Have a look at what debug shows in the parsed query. I think every
> bracket is quite significant actually and you are generating a
> different type of clause.
>
> Also, have you thought about putting those individual clauses into
> 'fq' instead of jointly into 'q'? This may give you faster search too,
> as Solr will not have to worry about ranking.
>
> Regards,
>Alex.
>
> On 18 September 2018 at 14:38, Antelmo Aguilar  wrote:
> > Hi,
> >
> > I am doing some date queries and I was wondering if there is some way of
> > getting this query to work.
> >
> > ( ( !{!field f=collection_date_range op=Within v='[2000-01-01 TO
> > 2018-09-18]'} AND !{!field f=collection_date_range op=Within v='[1960-01-01
> > TO 1998-09-18]'} ) AND collection_season:([1999-05 TO 1999-05]) )
> >
> > I understand that I could just not do NOT queries and instead search for
> > 1998-09-18 TO 2000-01-01, but doing NOT queries gives me more results (e.g
> > records that do not have collection_date_range defined).
> >
> > If I remove the parenthesis enclosing the NOT queries, it works.  Without
> > the parenthesis the query does not return results though.  So the query
> > below, does work.
> >
> > ( !{!field f=collection_date_range op=Within v='[2000-01-01 TO
> > 2018-09-18]'} AND !{!field f=collection_date_range op=Within v='[1960-01-01
> > TO 1998-09-18]'} AND collection_season:([1999-05 TO 1999-05]) )
> >
> > Any insight would be appreciated.  I really do not see the reason why the
> > parenthesis enclosing the NOT queries would cause it to not return results.
> >
> > Best,
> > Antelmo

Re: [OT] 20180917-Need Apache SOLR support

The only hard-and-fast rule is that you must re-index from source when
you upgrade to Solr X+2. Solr (well, Lucene) tries very hard to
maintain one-major-version back-compatibility, so Solr 8 will function
with Solr 7 indexes but _not_ any index _ever touched_ by 6x.

That said, it's usually a good idea to re-index anyway when jumping a
major version (say Solr 7 -> Solr 8) if possible.

Best,
Erick
On Tue, Sep 18, 2018 at 11:22 AM Christopher Schultz
 wrote:
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Walter,
>
> On 9/18/18 11:24, Walter Underwood wrote:
> > It isn’t very clear from that page, but the two backup methods make
> > a copy of the indexes in a commit-aware way. That is all. One
> > method copies them to a new server, the other to files in the data
> > directory.
> >
> > Database backups generally have a separate backup format which is
> > independent of the database version. For example, mysqldump
> > generates a backup as SQL statements.
> >
> > The Solr backup is version-locked, because it is just a copy of the
> > index files. People who are used to database backups might be very
> > surprised when they could not load a Solr backup into a server with
> > a different version or on a different architecture.
> >
> > The only version-independent restore in Solr is to reload the data
> > from the source repository.
>
> Thanks for the explanation.
>
> We recently re-built from source and it took about 10 minutes. If we
> can get better performance for a restore starting with a "backup"
> (which is likely), we'll probably go ahead and do that, with the
> understanding that the ultimate fallback is reload-from-source.
>
> When upgrading to a new version of Solr, what are the rules for when
> you have to discard your whole index and reload from source? We have
> been in the 7.x line since we began development and testing and have
> not had any reason to reload from source so far. (Well, except when we
> had to make schema changes.)
>
> Thanks,
> - -chris
>
> >> On Sep 18, 2018, at 8:15 AM, Christopher Schultz
> >>  wrote:
> >>
> > Walter,
> >
> > On 9/17/18 11:39, Walter Underwood wrote:
>  Do not use Solr as a database. It was never designed to be a
>  database. It is missing a lot of features that are normal in
>  databases.
> 
>  [...] * no real backups (Solr backup is a cold server, not a
>  dump/load)
> >
> > I'm just curious... if Solr has "no real backups", why is there a
> > complete client API for performing backups and restores?
> >
> > https://lucene.apache.org/solr/guide/7_4/making-and-restoring-backups.
> ht
> >
> >
> ml
> >
> > Thanks, -chris
> >
> >
> -BEGIN PGP SIGNATURE-
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhQlkACgkQHPApP6U8
> pFgcyRAAm4/FeeGn3eGv4CwNVfc9GrsUYc4/YexdwRT7oFUgqTC2kYeegj/YAgm3
> ZwgfLDkDL0HR51i/pp4UG8MDTB5NFtp8Jg6+JSE4SutAA72N6vnwnC1Z/T52i0xG
> OqT0lFKeIL7Tt5c0FffbAMx5rgbFkzWHNWgFFqYFB0WZEzj4JM6rmAiDqLunRGPA
> xAZUnZCRMXhcVZT0bmmnSGlyU+JHL0ZQrJD/WX4DOJo2ZyAvP7pSYBEU+nTfyjzJ
> kE3rx1W9o269yc052FJTk5rRADuHIdirQQ/SrUN3O7Nn7Hqqi2/6sqyM34CF6wmX
> IPv9frb/WTvXQ3nsFYmQVB1jEBBr5S+9pztO3jOtUbGGKCjBpVGDcOXJVBwEDzPW
> yII5EjpjkoYwVB6shUI2nfaM/Y6r4aQLrZO6A5FFePhQTm6BGa/i2i1A1uLqfvHY
> WMmv/QMYqXZu7hXW6l5NKpO1AtSKTZBq8iXi9BiOXSHNSxo9mT9kPLu40Uh63Gyp
> EHI/SfAPWNwOj01pkbyV+siyhAWBVWpolN1SinnW3ZR16Yddd2lRmNxdfVCC32pL
> OfRxrChtZ736kvm4ELzmUAUjITxpZf7AFgsrB6zyTlPRn/jvnW7sRsIsOa4BHdGC
> e4oCzK7waITu6jam4Zz6e3efyxSDfT2YZ7811L098mody1n2g5k=
> =PaVE
> -END PGP SIGNATURE-

Re: Reason Why Query Does Not Work

2018-09-18 Thread Antelmo Aguilar

Hi Alex and Erick,

We could possibly put them in fq, but how we set everything up would make
it hard to do so, but going that route might be the only option.

I did take a look at the parsed query and this is the difference:

This is the one that works:
"-WithinPrefixTreeQuery(fieldName=collection_date_range,queryShape=[2000 TO
2018-09-18],detailLevel=9,prefixGridScanLevel=7)
-WithinPrefixTreeQuery(fieldName=collection_date_range,queryShape=[1960 TO
1998-09-18],detailLevel=9,prefixGridScanLevel=7)
+IntersectsPrefixTreeQuery(fieldName=collection_season,queryShape=1999-05,detailLevel=9,prefixGridScanLevel=8)"

This is the one that does not work
"+(-WithinPrefixTreeQuery(fieldName=collection_date_range,queryShape=[2000
TO 2018-09-18],detailLevel=9,prefixGridScanLevel=7)
-WithinPrefixTreeQuery(fieldName=collection_date_range,queryShape=[1960 TO
1998-09-18],detailLevel=9,prefixGridScanLevel=7))
+IntersectsPrefixTreeQuery(fieldName=collection_season,queryShape=1999-05,detailLevel=9,prefixGridScanLevel=8)"

If someone knows by just looking at these queries why I get no results in
the second one, I would appreciate it.  From looking at the page Erick
pointed out, I do not think it covers my case?  ((-X AND -Y) AND Z)

Sorry for the trouble and thanks again!

Best,
Antelmo

On Tue, Sep 18, 2018 at 2:56 PM, Erick Erickson 
wrote:

> Also, Solr does _not_ implement strict Boolean logic, although with
> appropriate parentheses you can get it to look like Boolean logic.
> See: https://lucidworks.com/2011/12/28/why-not-and-or-and-not/.
>
> Additionally, for _some_ clauses a pure-not query is translated into
> *:* -pure_not_query which is helpful, but occasionally confusing.
>
> Best,
> Erick
> On Tue, Sep 18, 2018 at 11:43 AM Alexandre Rafalovitch
>  wrote:
> >
> > Have a look at what debug shows in the parsed query. I think every
> > bracket is quite significant actually and you are generating a
> > different type of clause.
> >
> > Also, have you thought about putting those individual clauses into
> > 'fq' instead of jointly into 'q'? This may give you faster search too,
> > as Solr will not have to worry about ranking.
> >
> > Regards,
> >Alex.
> >
> > On 18 September 2018 at 14:38, Antelmo Aguilar  wrote:
> > > Hi,
> > >
> > > I am doing some date queries and I was wondering if there is some way
> of
> > > getting this query to work.
> > >
> > > ( ( !{!field f=collection_date_range op=Within v='[2000-01-01 TO
> > > 2018-09-18]'} AND !{!field f=collection_date_range op=Within
> v='[1960-01-01
> > > TO 1998-09-18]'} ) AND collection_season:([1999-05 TO 1999-05]) )
> > >
> > > I understand that I could just not do NOT queries and instead search
> for
> > > 1998-09-18 TO 2000-01-01, but doing NOT queries gives me more results
> (e.g
> > > records that do not have collection_date_range defined).
> > >
> > > If I remove the parenthesis enclosing the NOT queries, it works.
> Without
> > > the parenthesis the query does not return results though.  So the query
> > > below, does work.
> > >
> > > ( !{!field f=collection_date_range op=Within v='[2000-01-01 TO
> > > 2018-09-18]'} AND !{!field f=collection_date_range op=Within
> v='[1960-01-01
> > > TO 1998-09-18]'} AND collection_season:([1999-05 TO 1999-05]) )
> > >
> > > Any insight would be appreciated.  I really do not see the reason why
> the
> > > parenthesis enclosing the NOT queries would cause it to not return
> results.
> > >
> > > Best,
> > > Antelmo
>

Re: [OT] 20180917-Need Apache SOLR support

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Walter,

On 9/18/18 11:24, Walter Underwood wrote:
> It isn’t very clear from that page, but the two backup methods make
> a copy of the indexes in a commit-aware way. That is all. One
> method copies them to a new server, the other to files in the data
> directory.
> 
> Database backups generally have a separate backup format which is 
> independent of the database version. For example, mysqldump
> generates a backup as SQL statements.
> 
> The Solr backup is version-locked, because it is just a copy of the
> index files. People who are used to database backups might be very
> surprised when they could not load a Solr backup into a server with
> a different version or on a different architecture.
> 
> The only version-independent restore in Solr is to reload the data
> from the source repository.

Thanks for the explanation.

We recently re-built from source and it took about 10 minutes. If we
can get better performance for a restore starting with a "backup"
(which is likely), we'll probably go ahead and do that, with the
understanding that the ultimate fallback is reload-from-source.

When upgrading to a new version of Solr, what are the rules for when
you have to discard your whole index and reload from source? We have
been in the 7.x line since we began development and testing and have
not had any reason to reload from source so far. (Well, except when we
had to make schema changes.)

Thanks,
- -chris

>> On Sep 18, 2018, at 8:15 AM, Christopher Schultz
>>  wrote:
>> 
> Walter,
> 
> On 9/17/18 11:39, Walter Underwood wrote:
 Do not use Solr as a database. It was never designed to be a 
 database. It is missing a lot of features that are normal in 
 databases.
 
 [...] * no real backups (Solr backup is a cold server, not a 
 dump/load)
> 
> I'm just curious... if Solr has "no real backups", why is there a 
> complete client API for performing backups and restores?
> 
> https://lucene.apache.org/solr/guide/7_4/making-and-restoring-backups.
ht
>
> 
ml
> 
> Thanks, -chris
> 
> 
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhQlkACgkQHPApP6U8
pFgcyRAAm4/FeeGn3eGv4CwNVfc9GrsUYc4/YexdwRT7oFUgqTC2kYeegj/YAgm3
ZwgfLDkDL0HR51i/pp4UG8MDTB5NFtp8Jg6+JSE4SutAA72N6vnwnC1Z/T52i0xG
OqT0lFKeIL7Tt5c0FffbAMx5rgbFkzWHNWgFFqYFB0WZEzj4JM6rmAiDqLunRGPA
xAZUnZCRMXhcVZT0bmmnSGlyU+JHL0ZQrJD/WX4DOJo2ZyAvP7pSYBEU+nTfyjzJ
kE3rx1W9o269yc052FJTk5rRADuHIdirQQ/SrUN3O7Nn7Hqqi2/6sqyM34CF6wmX
IPv9frb/WTvXQ3nsFYmQVB1jEBBr5S+9pztO3jOtUbGGKCjBpVGDcOXJVBwEDzPW
yII5EjpjkoYwVB6shUI2nfaM/Y6r4aQLrZO6A5FFePhQTm6BGa/i2i1A1uLqfvHY
WMmv/QMYqXZu7hXW6l5NKpO1AtSKTZBq8iXi9BiOXSHNSxo9mT9kPLu40Uh63Gyp
EHI/SfAPWNwOj01pkbyV+siyhAWBVWpolN1SinnW3ZR16Yddd2lRmNxdfVCC32pL
OfRxrChtZ736kvm4ELzmUAUjITxpZf7AFgsrB6zyTlPRn/jvnW7sRsIsOa4BHdGC
e4oCzK7waITu6jam4Zz6e3efyxSDfT2YZ7811L098mody1n2g5k=
=PaVE
-END PGP SIGNATURE-

Reason Why Query Does Not Work

2018-09-18 Thread Antelmo Aguilar

Hi,

I am doing some date queries and I was wondering if there is some way of
getting this query to work.

( ( !{!field f=collection_date_range op=Within v='[2000-01-01 TO
2018-09-18]'} AND !{!field f=collection_date_range op=Within v='[1960-01-01
TO 1998-09-18]'} ) AND collection_season:([1999-05 TO 1999-05]) )

I understand that I could just not do NOT queries and instead search for
1998-09-18 TO 2000-01-01, but doing NOT queries gives me more results (e.g
records that do not have collection_date_range defined).

If I remove the parenthesis enclosing the NOT queries, it works.  Without
the parenthesis the query does not return results though.  So the query
below, does work.

( !{!field f=collection_date_range op=Within v='[2000-01-01 TO
2018-09-18]'} AND !{!field f=collection_date_range op=Within v='[1960-01-01
TO 1998-09-18]'} AND collection_season:([1999-05 TO 1999-05]) )

Any insight would be appreciated.  I really do not see the reason why the
parenthesis enclosing the NOT queries would cause it to not return results.

Best,
Antelmo

Re: weird error for accessing solr

Then you are either seeing different instances or your browser is
hard-caching the Admin pages. Trying shift-reload or anonymous mode to
get a full-refresh of HTML/Javascript. Or even a command line request.

Regards,
   Alex.

On 18 September 2018 at 14:43, Gu, Steve (CDC/OD/OADS) (CTR)
 wrote:
> No the solr was not restarted as SolrCloud.  We see solr from one computer 
> and all cores are available for query, but from another computer, it shows 
> the admin page as solrcloud with errors on the page.  All the links on the 
> left nav  do not work either.
>
>
>
> -Original Message-
> From: Alexandre Rafalovitch 
> Sent: Tuesday, September 18, 2018 2:39 PM
> To: solr-user 
> Subject: Re: weird error for accessing solr
>
> Sounds like your Solr was restarted as a SolrCloud, maybe by an automated 
> script or an init service?
>
> If you created a core in a standalone mode and then restart the same 
> configuration in a SolrCloud mode, it would know that you have those 
> collections/cores, but will not be able to find any configuration files 
> (because it will expect them in ZooKeeper, not on disk). So, that would 
> explain the error.
>
> I would focus on the restart point, maybe check the logs (in
> server/logs) and see if there are hints there.
>
> Regards,
>Alex.
> P.s. Unless you are able to see the SolrCloud from one computer and Solr from 
> another at the same time/back-and-forth. That would indicate some sort of 
> proxy/routing and then I would look at the Overview page system variables to 
> confirm the directories/options. But that's kind of a distant second 
> possibility.
>
>
> On 18 September 2018 at 14:23, Gu, Steve (CDC/OD/OADS) (CTR)  
> wrote:
>> I have set up my solr as a standalone service and the its url is 
>> http://solr.server:8983/solr.  I opened 8983 on  solr.server to anyone, and 
>> solr can be accessed from laptops/desktops.  But when I tried to access the 
>> solr from some servers, I got the error of SolrCore Initialization Failures. 
>>  The left nav on the page is shown but indicates that the solr is set up as 
>> SolrCloud, which is not.
>>
>> I am really confused about this and have no idea how to tackle this problem. 
>>  Has anyone ever had a similar issue?  Or any idea why this is happening?
>>
>> Thanks
>> Steve
>>

Command Line Indexer

2018-09-18 Thread Dan Brown

I've been working on this for a while and it's finally in a state where
it's ready for public consumption.

This is a command line indexer that will index CSV or JSON documents:
https://github.com/likethecolor/solr-indexer

There are quite a few parameters/options that can be set.

One thing to note is that it will update individual fields.  That is,
unlike the Data Import Handler, it does not replace entire documents.

Please check it out and let me know what you think.

Dan

RE: weird error for accessing solr

2018-09-18 Thread Gu, Steve (CDC/OD/OADS) (CTR)

Alex,

I tried to curl http://solr.server:8983/solr/ and got different results from 
different machines.  I also did shift-reload which gave me the same result.  So 
it does not seem to be a browser cache issue.

I also shut down solr and tried to access it.  It gave connection failure error 
for both client machines.  So it is not the case they are connecting to 
different instances.

A bit more information, for the machine that see solr correctly, it is a 
laptop/desktop for developers.  The other machine which solr admin page showed 
solrcloud with core initialization error, is a dev/qa server. 

Any help will be greatly appreciated.

Steve

-Original Message-
From: Alexandre Rafalovitch  
Sent: Tuesday, September 18, 2018 2:45 PM
To: solr-user 
Subject: Re: weird error for accessing solr

Then you are either seeing different instances or your browser is hard-caching 
the Admin pages. Trying shift-reload or anonymous mode to get a full-refresh of 
HTML/Javascript. Or even a command line request.

Regards,
   Alex.

On 18 September 2018 at 14:43, Gu, Steve (CDC/OD/OADS) (CTR)  
wrote:
> No the solr was not restarted as SolrCloud.  We see solr from one computer 
> and all cores are available for query, but from another computer, it shows 
> the admin page as solrcloud with errors on the page.  All the links on the 
> left nav  do not work either.
>
>
>
> -Original Message-
> From: Alexandre Rafalovitch 
> Sent: Tuesday, September 18, 2018 2:39 PM
> To: solr-user 
> Subject: Re: weird error for accessing solr
>
> Sounds like your Solr was restarted as a SolrCloud, maybe by an automated 
> script or an init service?
>
> If you created a core in a standalone mode and then restart the same 
> configuration in a SolrCloud mode, it would know that you have those 
> collections/cores, but will not be able to find any configuration files 
> (because it will expect them in ZooKeeper, not on disk). So, that would 
> explain the error.
>
> I would focus on the restart point, maybe check the logs (in
> server/logs) and see if there are hints there.
>
> Regards,
>Alex.
> P.s. Unless you are able to see the SolrCloud from one computer and Solr from 
> another at the same time/back-and-forth. That would indicate some sort of 
> proxy/routing and then I would look at the Overview page system variables to 
> confirm the directories/options. But that's kind of a distant second 
> possibility.
>
>
> On 18 September 2018 at 14:23, Gu, Steve (CDC/OD/OADS) (CTR)  
> wrote:
>> I have set up my solr as a standalone service and the its url is 
>> http://solr.server:8983/solr.  I opened 8983 on  solr.server to anyone, and 
>> solr can be accessed from laptops/desktops.  But when I tried to access the 
>> solr from some servers, I got the error of SolrCore Initialization Failures. 
>>  The left nav on the page is shown but indicates that the solr is set up as 
>> SolrCloud, which is not.
>>
>> I am really confused about this and have no idea how to tackle this problem. 
>>  Has anyone ever had a similar issue?  Or any idea why this is happening?
>>
>> Thanks
>> Steve
>>

Re: [OT] 20180917-Need Apache SOLR support

2018-09-18 Thread Jan Høydahl

I guess you could do a version-independent backup with /export handler and store
docs in XML or JSON format. Or you could use streaming and store the entire 
index
as JSON tuples, which could then be ingested into another version.

But it is correct that the backup/restore feature of Solr is not primarily 
intended for archival
or moving a collection to a completely different version. It is primarily 
intended as a
much faster disaster recovery method than reindex from slow sources. But you 
COULD
also use it to quickly migrate from an old cluster to the next major version.

It would be cool to investigate an alternate backup command, which instructs 
each shard
leader to stream all documents to JSON inside the backup folder, in parallell. 
But you may
still get issues with the Zookeeper part if restoring to a very different 
version.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 18. sep. 2018 kl. 17:24 skrev Walter Underwood :
> 
> It isn’t very clear from that page, but the two backup methods make a copy
> of the indexes in a commit-aware way. That is all. One method copies them
> to a new server, the other to files in the data directory.
> 
> Database backups generally have a separate backup format which is 
> independent of the database version. For example, mysqldump generates
> a backup as SQL statements.
> 
> The Solr backup is version-locked, because it is just a copy of the index 
> files.
> People who are used to database backups might be very surprised when they
> could not load a Solr backup into a server with a different version or on a
> different architecture.
> 
> The only version-independent restore in Solr is to reload the data from the
> source repository.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Sep 18, 2018, at 8:15 AM, Christopher Schultz 
>>  wrote:
>> 
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>> 
>> Walter,
>> 
>> On 9/17/18 11:39, Walter Underwood wrote:
>>> Do not use Solr as a database. It was never designed to be a
>>> database. It is missing a lot of features that are normal in
>>> databases.
>>> 
>>> [...] * no real backups (Solr backup is a cold server, not a
>>> dump/load)
>> 
>> I'm just curious... if Solr has "no real backups", why is there a
>> complete client API for performing backups and restores?
>> 
>> https://lucene.apache.org/solr/guide/7_4/making-and-restoring-backups.ht
>> ml
>> 
>> Thanks,
>> - -chris
>> -BEGIN PGP SIGNATURE-
>> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>> 
>> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhFp8ACgkQHPApP6U8
>> pFgnhBAAre3Zb2mu++WVmY6rZlcc3uoRkDRva6iR602wA/w/EUabCmHEkO9maYEm
>> NoUREgBH9NtFPvYnjkEEL7/P/2hUErvRw0RfwsAo89ClYjjyMEH25+p5SNmudUmK
>> fKRSLRUyCbpE8ahKTPG44gRlki03uJJ2GA0r3vbTLvdqm1p5KO6sE4k/r3IYJ0QI
>> qZfUY4Un+LQ5vGMQ7qeGRcFhaAXVOaJmnLCRqGTS2hMTM1uM01TCblhOaeX5XHYD
>> Yra4m15Sr1H8p3S0CFsP8oqvDND0jEC4MxM9mQvHOvq9IwMreTSwACga35Wm6ItD
>> h1/Td9H/Puo8o9vQMaVfNcFD4TAqt+FkIHzQEb+FkQAMfbC9ZHsmBgvl8EUtPBq1
>> h2ODETEcD5SsmdfrP5OWUz+0OBhH7/HEgWRjHW9nSMzhPn4kYgpF/7VuFL8iy3re
>> /8TviTf446I859QNragWXACdARhCzMo8AoXIs/dC70CGDvxuKmEcI6tad9Zsxcf2
>> +yaFa3Fzddulaeao4juZVbRVJ9eewFOSawMXDc14TeL6t13CxzxFasHiYu0C5euV
>> XhKSWEHYj58ijS/KU4FMDCEWZhr1KWEKwfVp7hZ2CZZNW5kNPbv97otKvxB0cKyS
>> LTK6PtZoZbTWXFa8rT3yq28/x6gMULQeo0ZBZLTXEJKpfAT2vAU=
>> =Fh1S
>> -END PGP SIGNATURE-
>

Re: Modify Schema for Solr Cloud

2018-09-18 Thread Jan Høydahl

Three ways:

1. Use Admin UI Schema tab and add/delete fields/copyfields there. Not support 
for fieldTypes
2. Use Schema API, see ref.guide
3. bin/solr zk cp zk:/configs/myconfig/managed-schema .
go ahead edit schema
bin/solr zk cp managed-schema zk:/configs/myconfig/managed-schema
reload collection in UI

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 18. sep. 2018 kl. 07:30 skrev Rathor, Piyush (US - Philadelphia) 
> :
> 
> Hi All,
>  
> I am new to solr cloud.
>  
> Can you please let me know on how to update the schema on solr cloud.
>  
> Thanks & Regards
> Piyush Rathor
> Consultant
> Deloitte Digital (Salesforce.com  / Force.com 
> )
> Deloitte Consulting Pvt. Ltd.
> Office: +1 (615) 209 4980
> Mobile : +1 (302) 397 1491
> prat...@deloitte.com  | www.deloitte.com 
> 
> 
> Please consider the environment before printing.
>  
> This message (including any attachments) contains confidential information 
> intended for a specific individual and purpose, and is protected by law. If 
> you are not the intended recipient, you should delete this message and any 
> disclosure, copying, or distribution of this message, or the taking of any 
> action based on it, by you is strictly prohibited.
> 
> v.E.1
>

Re: weird error for accessing solr


On 9/18/2018 12:23 PM, Gu, Steve (CDC/OD/OADS) (CTR) wrote:

I have set up my solr as a standalone service and the its url is 
http://solr.server:8983/solr.  I opened 8983 on  solr.server to anyone, and 
solr can be accessed from laptops/desktops.  But when I tried to access the 
solr from some servers, I got the error of SolrCore Initialization Failures.  
The left nav on the page is shown but indicates that the solr is set up as 
SolrCloud, which is not.


On the dashboard when you see the Cloud tab, can you share *ALL* of 
what's under JVM in the Args section?


Thanks,
Shawn

Re: Solr 7.2.1 Collection Backup Performance issue

On 9/18/2018 11:00 AM, Ganesh Sethuraman wrote:

We are using Solr 7.2.1 with SolrCloud with 35 collections with 1 node ZK
ensemble (in lower environment, we will have 3 nodes ensemble) in AWS. We
are testing to see if we have Async Solr Cloud backup (
https://lucene.apache.org/solr/guide/7_2/collections-api.html#backup) done
every time we are create a new collection or update an existing collection.
There are 1 replica and 8 shards per collection. Two Solr nodes.

For the largest collection (index size of 80GB), we see that BACKUP to the
EFS drive takes about ~10 mins. We are doing lot of /get (real time get)
option from the application. We are seeing that that the performance
significantly (2x) degrades on the read (get) performance when we BACK-UP
is going on in parallel.

My best guess here is that you do not have enough memory. For good
performance, Solr is extremely reliant on having certain parts of the
index data sitting in memory, so that it doesn't have to actually read
the disk to discover matches for a query. When all is working well,
that data will be read from memory instead of the disk. Memory is MUCH
MUCH faster than a disk.

Making a backup is going to read ALL of the index data. So if you do
not have enough spare memory to cache the entire index, reading the
index to make the backup is going to push the important parts of the
index out of the cache, and then Solr will have to actually go and read
the disk in order to satisfy a query.

https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

Can you gather a screenshot of your process list and put it on a file
sharing website? You'll find instructions on how to do this here:

https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

Thanks,
Shawn

Re: Command Line Indexer

1. Congrats!
2. How is this different from bin\post? CSV and JSON are both
supported formats. I am sure it is very clear to you, but to a visitor
- not so much.
3. What is the significance of "replace just the field". Is that an
atomic update? Similar to AtomicUpdateProcessorFactory? What is the
use-case?

Basically, what is the business/use-case for the tool, as opposed to
all the technical parameters, one by one.

Regards,
   Alex.

On 18 September 2018 at 14:51, Dan Brown  wrote:
> I've been working on this for a while and it's finally in a state where
> it's ready for public consumption.
>
> This is a command line indexer that will index CSV or JSON documents:
> https://github.com/likethecolor/solr-indexer
>
> There are quite a few parameters/options that can be set.
>
> One thing to note is that it will update individual fields.  That is,
> unlike the Data Import Handler, it does not replace entire documents.
>
> Please check it out and let me know what you think.
>
> Dan

highlighting in more like this?

2018-09-18 Thread Matt Work Coarr

Is it possible to get highlighting in more like this queries?  My initial
attempts seem to indicate that it isn't possible (I've only attempted this
via modifying MLT query urls)

(I'm looking for something similar to hl=true=field1,field5,field6 in
a normal search)

Thanks,
Matt

Re: Solr standalone health checks

Take a look at the metrics available starting with 6.4
(https://lucene.apache.org/solr/guide/7_4/performance-statistics-reference.html).

Or just hit: http://blahblahblah/solr/admin/metrics to see them all.
WARNING: be prepared to spend an hour looking through the list, there
are a _log_ of them.

Erick
On Mon, Sep 17, 2018 at 2:45 PM Christopher Schultz
 wrote:
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Shawn,
>
> On 9/17/18 17:21, Shawn Heisey wrote:
> > On 9/17/2018 3:01 PM, Christopher Schultz wrote:
> >> The basic questions I'd like to have answered on a regular basis
> >> are:
> >>
> >> 1. Is the JVM up (this can be done with a ping, of course) 2. Is
> >> the heap healthy? Any OOMEs? 3. Will a sample query return in a
> >> reasonable amount of time?
> >>
> >> 1 and 3 are quite easily done using e.g. /solr/[c]/ping, but #2
> >> is trickier. I can do this via JMX, but I'd prefer to avoid
> >> spinning-up a whole JVM just to probe Solr for one or two
> >> values.
> >
> > If your Solr version is at least 5.5.1 and you're NOT on Windows,
> > number 2 can also be verified by a ping request.
>
> Interesting. I did mention 7.4.0 but not my OS. I'm on Debian Linux,
> and I'm running Solr using the Solr-supplied init.d scripts (via solr
> install).
>
> > With a new enough version on the correct operating system, Solr is
> > started with an option that will kill the process should an
> > OutOfMemoryError occur.  When that happens, it won't be able to
> > answer a ping request.
> >
> > Here's the issue that fixes a problem with the startup on 5.5.1 or
> > later:
> >
> > https://issues.apache.org/jira/browse/SOLR-8145
>
> Given that, I'll go ahead and set things up to do a simple
> /solr/[c]/ping request for health-monitoring.
>
> Thanks,
> - -chris
> -BEGIN PGP SIGNATURE-
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlugIGMACgkQHPApP6U8
> pFi8YQ//dwJdJG1VtqUgbFI437HzUMhuI+9SBOf0nateQFqQbfoqkLhC/z3dwjvj
> qqhqcT68D2x1bYYk/5we7KD9I6PZ50mL5sZlU34NYC9AFMB5QEdTtWljlqGM/Xoe
> elvsKYJVmZn9kvc6iwqyLU71clcRX27NhEDAFrPrCmhgZKRTpNqtgYyEOsIJZ/CL
> muMml4hV5eNIc+VOle+jcqwTrWY4xtaf6Fmo6NLCsUvC2CB5/QI7JoYzvnLvVVMD
> IVn6AnsLd/wIVSJiPyVYDA58/pVj1w6Jb36L8eg0fxfoO+eAkObUU3s71QglZlIx
> m9Qkd8lGQ7qNxUDOMSgPNW/j7tZcxn39FRsM9b3z7kWJGriBcz/S5jX9QSNcArmh
> pyHIf48y8wOgl/wQsmsGgXsHtdlwJu+84B3sFGjUKQU/2JPO88XJEo+pKluaMFDO
> E2yZGdTvfRbXLTqe/XCGN89yKyIOKJAX2ZXP9EU0PmFSFbeod6oqbT/MKO3+DzCm
> PpkUV10vlmqnsJ+5edj89hmM5gJOKcwQTDZ2E/U5tvs4DJHZTG578hnZp1coDU/c
> m7M80m5SyE/5ycYBODp6oyJNAkEf6suJ+BIyQkr61t9/L7yvwSm80nFheFpVMIMX
> N/lRL9ar4U/lLDL00aVhDecyNSFOvDjSUBlIlQ4hUb80bZiz3xY=
> =lOp1
> -END PGP SIGNATURE-

Re: [OT] 20180917-Need Apache SOLR support

2018-09-18 Thread Walter Underwood

It isn’t very clear from that page, but the two backup methods make a copy
of the indexes in a commit-aware way. That is all. One method copies them
to a new server, the other to files in the data directory.

Database backups generally have a separate backup format which is 
independent of the database version. For example, mysqldump generates
a backup as SQL statements.

The Solr backup is version-locked, because it is just a copy of the index files.
People who are used to database backups might be very surprised when they
could not load a Solr backup into a server with a different version or on a
different architecture.

The only version-independent restore in Solr is to reload the data from the
source repository.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 18, 2018, at 8:15 AM, Christopher Schultz 
>  wrote:
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
> 
> Walter,
> 
> On 9/17/18 11:39, Walter Underwood wrote:
>> Do not use Solr as a database. It was never designed to be a
>> database. It is missing a lot of features that are normal in
>> databases.
>> 
>> [...] * no real backups (Solr backup is a cold server, not a
>> dump/load)
> 
> I'm just curious... if Solr has "no real backups", why is there a
> complete client API for performing backups and restores?
> 
> https://lucene.apache.org/solr/guide/7_4/making-and-restoring-backups.ht
> ml
> 
> Thanks,
> - -chris
> -BEGIN PGP SIGNATURE-
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhFp8ACgkQHPApP6U8
> pFgnhBAAre3Zb2mu++WVmY6rZlcc3uoRkDRva6iR602wA/w/EUabCmHEkO9maYEm
> NoUREgBH9NtFPvYnjkEEL7/P/2hUErvRw0RfwsAo89ClYjjyMEH25+p5SNmudUmK
> fKRSLRUyCbpE8ahKTPG44gRlki03uJJ2GA0r3vbTLvdqm1p5KO6sE4k/r3IYJ0QI
> qZfUY4Un+LQ5vGMQ7qeGRcFhaAXVOaJmnLCRqGTS2hMTM1uM01TCblhOaeX5XHYD
> Yra4m15Sr1H8p3S0CFsP8oqvDND0jEC4MxM9mQvHOvq9IwMreTSwACga35Wm6ItD
> h1/Td9H/Puo8o9vQMaVfNcFD4TAqt+FkIHzQEb+FkQAMfbC9ZHsmBgvl8EUtPBq1
> h2ODETEcD5SsmdfrP5OWUz+0OBhH7/HEgWRjHW9nSMzhPn4kYgpF/7VuFL8iy3re
> /8TviTf446I859QNragWXACdARhCzMo8AoXIs/dC70CGDvxuKmEcI6tad9Zsxcf2
> +yaFa3Fzddulaeao4juZVbRVJ9eewFOSawMXDc14TeL6t13CxzxFasHiYu0C5euV
> XhKSWEHYj58ijS/KU4FMDCEWZhr1KWEKwfVp7hZ2CZZNW5kNPbv97otKvxB0cKyS
> LTK6PtZoZbTWXFa8rT3yq28/x6gMULQeo0ZBZLTXEJKpfAT2vAU=
> =Fh1S
> -END PGP SIGNATURE-

Solr 7.2.1 Collection Backup Performance issue

2018-09-18 Thread Ganesh Sethuraman

Hi

We are using Solr 7.2.1 with SolrCloud with 35 collections with 1 node ZK
ensemble (in lower environment, we will have 3 nodes ensemble) in AWS. We
are testing to see if we have Async Solr Cloud backup  (
https://lucene.apache.org/solr/guide/7_2/collections-api.html#backup) done
every time we are create a new collection or update an existing collection.
There are 1 replica and 8 shards per collection. Two Solr nodes.

For the largest collection (index size of 80GB), we see that BACKUP to the
EFS drive takes about ~10 mins. We are doing lot of /get (real time get)
option from the application. We are seeing that that the performance
significantly (2x) degrades on the read (get) performance when we BACK-UP
is going on in parallel.

Is there anyway to tune the system so that read does not suffer?

Any other best practices? like should we run back up during off peak load?

Is there a way to keep track of which collections are already backed up?

Re: weird error for accessing solr

bq. can you share *ALL* of...

from both machines!
On Tue, Sep 18, 2018 at 12:40 PM Shawn Heisey  wrote:
>
> On 9/18/2018 12:23 PM, Gu, Steve (CDC/OD/OADS) (CTR) wrote:
> > I have set up my solr as a standalone service and the its url is 
> > http://solr.server:8983/solr.  I opened 8983 on  solr.server to anyone, and 
> > solr can be accessed from laptops/desktops.  But when I tried to access the 
> > solr from some servers, I got the error of SolrCore Initialization 
> > Failures.  The left nav on the page is shown but indicates that the solr is 
> > set up as SolrCloud, which is not.
>
> On the dashboard when you see the Cloud tab, can you share *ALL* of
> what's under JVM in the Args section?
>
> Thanks,
> Shawn
>

Re: Command Line Indexer

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Dan,

On 9/18/18 2:51 PM, Dan Brown wrote:
> I've been working on this for a while and it's finally in a state
> where it's ready for public consumption.
> 
> This is a command line indexer that will index CSV or JSON
> documents: https://github.com/likethecolor/solr-indexer
> 
> There are quite a few parameters/options that can be set.
> 
> One thing to note is that it will update individual fields.  That
> is, unlike the Data Import Handler, it does not replace entire
> documents.
> 
> Please check it out and let me know what you think.

How is this different from the bin/post tool that ships with Solr?

Or is that you meant when you said "this is unlike the Data Import
Handler".

AIUI, Solr doesn't support updating a single field in a document. The
document is replaced no matter how hard to try to be surgical about
updating a single field.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhXlYACgkQHPApP6U8
pFjIeQ/+PRIx+I+IDW9XTqGNV5TIWYf+yQKC/4JpTV4Ndj7MZLsEEw+cfMvFTvQt
44dK7CnDKEDgQHZlMccWKd9/Th1k/5g40VMugBMsayRwUc83Onawdi4HQfnig4et
VN0/RaZ/IBo2AThsgEvUNplXYyY3BtyrUt6miiBsVkhKstI/BnmKqZvsRgvVjH0P
K1Xc5F2LNyXswvoIZqd3YmEa9p7CYMy7COsFV9KOeSymKlB7UoHulZqpJ9MRYkmn
YWjc9dHIRjpz5TUrJqWhZUG03uGXGtTnaXEku1Hb98WyIUZcHxkwN8W7qm6/B0CG
inPxfGRFH9EbUdcK4qeXmbQqty2sbKMQ6hogpRd/NEzgSWjDapiEUT1xz+p5V6wG
XM0ILaiLJ8zHJA6oUY0w5SNNyhdnd76CDpCK7T7YBm+aIxUDv9zoj6TLNceEaLi0
SjfI83LvaR1gM/ZeVO77d+1IY9maU1+5m0EZFjAETfMGj5dwYRvBub0Oo6QQuLUm
roF5R5b/bg/WjjPF1n4CJ7gTr/WBMzahKFnnQvoYD3OQqZpoasoEUifPpSd9OgvO
yEok0VqwxPeXdHgE+Vy+BlXn6QqshB3BYnUSNbpFXlNsOIQojfJXkjcCa+dP1nyF
JCElvmEgBG8K1WzGo4WAtVqJs7WDzQlmY2RDrETGsVbnqkTojXA=
=AmkJ
-END PGP SIGNATURE-

Re: Reason Why Query Does Not Work

I think this is the issue with top-level negative clause. Lucene does
not know what "-x" means without "*:* -x" to establish the baseline
set to subtract from. Solr has a workaround for top-level negative
query, so "-WithinPrefixTreeQuery..." triggers that special treatment.
But "+(-WithinPrefixTreeQuery" does not and therefore it silently
fails.

There is quite a bit of discussion in the archives about it. I am just
summarizing it, hopefully correctly. You can search the terms above to
find more detailed answers.

Regards,
   Alex.

On 18 September 2018 at 15:05, Antelmo Aguilar  wrote:
> Hi Alex and Erick,
>
> We could possibly put them in fq, but how we set everything up would make
> it hard to do so, but going that route might be the only option.
>
> I did take a look at the parsed query and this is the difference:
>
> This is the one that works:
> "-WithinPrefixTreeQuery(fieldName=collection_date_range,queryShape=[2000 TO
> 2018-09-18],detailLevel=9,prefixGridScanLevel=7)
> -WithinPrefixTreeQuery(fieldName=collection_date_range,queryShape=[1960 TO
> 1998-09-18],detailLevel=9,prefixGridScanLevel=7)
> +IntersectsPrefixTreeQuery(fieldName=collection_season,queryShape=1999-05,detailLevel=9,prefixGridScanLevel=8)"
>
> This is the one that does not work
> "+(-WithinPrefixTreeQuery(fieldName=collection_date_range,queryShape=[2000
> TO 2018-09-18],detailLevel=9,prefixGridScanLevel=7)
> -WithinPrefixTreeQuery(fieldName=collection_date_range,queryShape=[1960 TO
> 1998-09-18],detailLevel=9,prefixGridScanLevel=7))
> +IntersectsPrefixTreeQuery(fieldName=collection_season,queryShape=1999-05,detailLevel=9,prefixGridScanLevel=8)"
>
> If someone knows by just looking at these queries why I get no results in
> the second one, I would appreciate it.  From looking at the page Erick
> pointed out, I do not think it covers my case?  ((-X AND -Y) AND Z)
>
> Sorry for the trouble and thanks again!
>
> Best,
> Antelmo
>
> On Tue, Sep 18, 2018 at 2:56 PM, Erick Erickson 
> wrote:
>
>> Also, Solr does _not_ implement strict Boolean logic, although with
>> appropriate parentheses you can get it to look like Boolean logic.
>> See: https://lucidworks.com/2011/12/28/why-not-and-or-and-not/.
>>
>> Additionally, for _some_ clauses a pure-not query is translated into
>> *:* -pure_not_query which is helpful, but occasionally confusing.
>>
>> Best,
>> Erick
>> On Tue, Sep 18, 2018 at 11:43 AM Alexandre Rafalovitch
>>  wrote:
>> >
>> > Have a look at what debug shows in the parsed query. I think every
>> > bracket is quite significant actually and you are generating a
>> > different type of clause.
>> >
>> > Also, have you thought about putting those individual clauses into
>> > 'fq' instead of jointly into 'q'? This may give you faster search too,
>> > as Solr will not have to worry about ranking.
>> >
>> > Regards,
>> >Alex.
>> >
>> > On 18 September 2018 at 14:38, Antelmo Aguilar  wrote:
>> > > Hi,
>> > >
>> > > I am doing some date queries and I was wondering if there is some way
>> of
>> > > getting this query to work.
>> > >
>> > > ( ( !{!field f=collection_date_range op=Within v='[2000-01-01 TO
>> > > 2018-09-18]'} AND !{!field f=collection_date_range op=Within
>> v='[1960-01-01
>> > > TO 1998-09-18]'} ) AND collection_season:([1999-05 TO 1999-05]) )
>> > >
>> > > I understand that I could just not do NOT queries and instead search
>> for
>> > > 1998-09-18 TO 2000-01-01, but doing NOT queries gives me more results
>> (e.g
>> > > records that do not have collection_date_range defined).
>> > >
>> > > If I remove the parenthesis enclosing the NOT queries, it works.
>> Without
>> > > the parenthesis the query does not return results though.  So the query
>> > > below, does work.
>> > >
>> > > ( !{!field f=collection_date_range op=Within v='[2000-01-01 TO
>> > > 2018-09-18]'} AND !{!field f=collection_date_range op=Within
>> v='[1960-01-01
>> > > TO 1998-09-18]'} AND collection_season:([1999-05 TO 1999-05]) )
>> > >
>> > > Any insight would be appreciated.  I really do not see the reason why
>> the
>> > > parenthesis enclosing the NOT queries would cause it to not return
>> results.
>> > >
>> > > Best,
>> > > Antelmo
>>

Re: Command Line Indexer

2018-09-18 Thread Dan Brown

1. Thank you.

2. I think this is what you're looking for.  You'd be able to be more
specific than with bin/post.  For instance:
a. specify the CSV delimiter, CSV quote character, and multivalued field
delimiter
b. the dynamic-fields feature let's you write plugins in Java to define
values (very simple example: combine field values f_name, m_name, l_name to
populate a full_name field)
c. specify field order for mapping onto SOLR fields, data types, date
formats of source data; perhaps your CSV headers/JSON keys don't cleanly
map to SOLR field names
d. flag whether the first row of a CSV is the header and should not be
indexed
e. use literal values - e.g., instead of having to alter the source data to
have a column whose value is "foo" you can configure a field to always have
the same literal value for all documents
f. set the number of times to retry when there is an error and the amount
of time between retries (e.g., sometimes zk was not consistently responsive)
g. skip fields - e.g., your data have 10 columns but you only want to index
columns 1, 3, 5, and 9
h. send soft commits after a specified number of batches
i. combine fields to generate the uniqueKey value

3. Yes, atomic updates.  For instance, index data using DIH then use this
index to provide additional values to fields in those documents (e.g.,
maybe the extra data come from a different data source like BigQuery).

I hope this brings more clarity to this tool's features and answers all
your questions.  Please ask questions if anyone has more.

Dan

On Tue, Sep 18, 2018 at 3:21 PM Christopher Schultz <
ch...@christopherschultz.net> wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Dan,
>
> On 9/18/18 2:51 PM, Dan Brown wrote:
> > I've been working on this for a while and it's finally in a state
> > where it's ready for public consumption.
> >
> > This is a command line indexer that will index CSV or JSON
> > documents: https://github.com/likethecolor/solr-indexer
> >
> > There are quite a few parameters/options that can be set.
> >
> > One thing to note is that it will update individual fields.  That
> > is, unlike the Data Import Handler, it does not replace entire
> > documents.
> >
> > Please check it out and let me know what you think.
>
> How is this different from the bin/post tool that ships with Solr?
>
> Or is that you meant when you said "this is unlike the Data Import
> Handler".
>
> AIUI, Solr doesn't support updating a single field in a document. The
> document is replaced no matter how hard to try to be surgical about
> updating a single field.
>
> - -chris
> -BEGIN PGP SIGNATURE-
> Comment: GPGTools - http://gpgtools.org
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhXlYACgkQHPApP6U8
> pFjIeQ/+PRIx+I+IDW9XTqGNV5TIWYf+yQKC/4JpTV4Ndj7MZLsEEw+cfMvFTvQt
> 44dK7CnDKEDgQHZlMccWKd9/Th1k/5g40VMugBMsayRwUc83Onawdi4HQfnig4et
> VN0/RaZ/IBo2AThsgEvUNplXYyY3BtyrUt6miiBsVkhKstI/BnmKqZvsRgvVjH0P
> K1Xc5F2LNyXswvoIZqd3YmEa9p7CYMy7COsFV9KOeSymKlB7UoHulZqpJ9MRYkmn
> YWjc9dHIRjpz5TUrJqWhZUG03uGXGtTnaXEku1Hb98WyIUZcHxkwN8W7qm6/B0CG
> inPxfGRFH9EbUdcK4qeXmbQqty2sbKMQ6hogpRd/NEzgSWjDapiEUT1xz+p5V6wG
> XM0ILaiLJ8zHJA6oUY0w5SNNyhdnd76CDpCK7T7YBm+aIxUDv9zoj6TLNceEaLi0
> SjfI83LvaR1gM/ZeVO77d+1IY9maU1+5m0EZFjAETfMGj5dwYRvBub0Oo6QQuLUm
> roF5R5b/bg/WjjPF1n4CJ7gTr/WBMzahKFnnQvoYD3OQqZpoasoEUifPpSd9OgvO
> yEok0VqwxPeXdHgE+Vy+BlXn6QqshB3BYnUSNbpFXlNsOIQojfJXkjcCa+dP1nyF
> JCElvmEgBG8K1WzGo4WAtVqJs7WDzQlmY2RDrETGsVbnqkTojXA=
> =AmkJ
> -END PGP SIGNATURE-
>

Re: Command Line Indexer

2018-09-18 Thread Dan Brown

Yup, thanks for the clarification.  I see now that some of the items I list
in 2 are moot.

On Tue, Sep 18, 2018 at 4:16 PM Alexandre Rafalovitch 
wrote:

> Uhm, inline:
>
> On 18 September 2018 at 17:05, Dan Brown  wrote:
> > 1. Thank you.
> >
> > 2. I think this is what you're looking for.  You'd be able to be more
> > specific than with bin/post.  For instance:
> > a. specify the CSV delimiter, CSV quote character, and multivalued field
> > delimiter
>
> http://lucene.apache.org/solr/guide/7_4/uploading-data-with-index-handlers.html
> separator - (global and field local for multivalued)
> encapsulator - for CSV quote characters
>
> > b. the dynamic-fields feature let's you write plugins in Java to define
> > values (very simple example: combine field values f_name, m_name, l_name
> to
> > populate a full_name field)
> UpdateRequestProcessors. Your example specifically:
>
> > c. specify field order for mapping onto SOLR fields, data types, date
> > formats of source data; perhaps your CSV headers/JSON keys don't cleanly
> > map to SOLR field names
> > d. flag whether the first row of a CSV is the header and should not be
> > indexed
> > e. use literal values - e.g., instead of having to alter the source data
> to
> > have a column whose value is "foo" you can configure a field to always
> have
> > the same literal value for all documents
> > f. set the number of times to retry when there is an error and the amount
> > of time between retries (e.g., sometimes zk was not consistently
> responsive)
> > g. skip fields - e.g., your data have 10 columns but you only want to
> index
> > columns 1, 3, 5, and 9
> > h. send soft commits after a specified number of batches
> > i. combine fields to generate the uniqueKey value
> >
> > 3. Yes, atomic updates.  For instance, index data using DIH then use this
> > index to provide additional values to fields in those documents (e.g.,
> > maybe the extra data come from a different data source like BigQuery).
> >
> > I hope this brings more clarity to this tool's features and answers all
> > your questions.  Please ask questions if anyone has more.
> >
> > Dan
> >
> >
> > On Tue, Sep 18, 2018 at 3:21 PM Christopher Schultz <
> > ch...@christopherschultz.net> wrote:
> >
> >> -BEGIN PGP SIGNED MESSAGE-
> >> Hash: SHA256
> >>
> >> Dan,
> >>
> >> On 9/18/18 2:51 PM, Dan Brown wrote:
> >> > I've been working on this for a while and it's finally in a state
> >> > where it's ready for public consumption.
> >> >
> >> > This is a command line indexer that will index CSV or JSON
> >> > documents: https://github.com/likethecolor/solr-indexer
> >> >
> >> > There are quite a few parameters/options that can be set.
> >> >
> >> > One thing to note is that it will update individual fields.  That
> >> > is, unlike the Data Import Handler, it does not replace entire
> >> > documents.
> >> >
> >> > Please check it out and let me know what you think.
> >>
> >> How is this different from the bin/post tool that ships with Solr?
> >>
> >> Or is that you meant when you said "this is unlike the Data Import
> >> Handler".
> >>
> >> AIUI, Solr doesn't support updating a single field in a document. The
> >> document is replaced no matter how hard to try to be surgical about
> >> updating a single field.
> >>
> >> - -chris
> >> -BEGIN PGP SIGNATURE-
> >> Comment: GPGTools - http://gpgtools.org
> >> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> >>
> >> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhXlYACgkQHPApP6U8
> >> pFjIeQ/+PRIx+I+IDW9XTqGNV5TIWYf+yQKC/4JpTV4Ndj7MZLsEEw+cfMvFTvQt
> >> 44dK7CnDKEDgQHZlMccWKd9/Th1k/5g40VMugBMsayRwUc83Onawdi4HQfnig4et
> >> VN0/RaZ/IBo2AThsgEvUNplXYyY3BtyrUt6miiBsVkhKstI/BnmKqZvsRgvVjH0P
> >> K1Xc5F2LNyXswvoIZqd3YmEa9p7CYMy7COsFV9KOeSymKlB7UoHulZqpJ9MRYkmn
> >> YWjc9dHIRjpz5TUrJqWhZUG03uGXGtTnaXEku1Hb98WyIUZcHxkwN8W7qm6/B0CG
> >> inPxfGRFH9EbUdcK4qeXmbQqty2sbKMQ6hogpRd/NEzgSWjDapiEUT1xz+p5V6wG
> >> XM0ILaiLJ8zHJA6oUY0w5SNNyhdnd76CDpCK7T7YBm+aIxUDv9zoj6TLNceEaLi0
> >> SjfI83LvaR1gM/ZeVO77d+1IY9maU1+5m0EZFjAETfMGj5dwYRvBub0Oo6QQuLUm
> >> roF5R5b/bg/WjjPF1n4CJ7gTr/WBMzahKFnnQvoYD3OQqZpoasoEUifPpSd9OgvO
> >> yEok0VqwxPeXdHgE+Vy+BlXn6QqshB3BYnUSNbpFXlNsOIQojfJXkjcCa+dP1nyF
> >> JCElvmEgBG8K1WzGo4WAtVqJs7WDzQlmY2RDrETGsVbnqkTojXA=
> >> =AmkJ
> >> -END PGP SIGNATURE-
> >>
>

Re: Command Line Indexer


On 9/18/2018 2:21 PM, Christopher Schultz wrote:

AIUI, Solr doesn't support updating a single field in a document. The
document is replaced no matter how hard to try to be surgical about
updating a single field.


Solr does have Atomic Update functionality.  For this to work, the index 
must be appropriately configured.  Many indexes do not qualify.  Atomic 
Updates let the user send a request that is basically an update for 
individual fields rather than the full document.  Solr will read the 
existing index data and translate that request internally to a full 
document update.  The user thinks they are just updating a portion of 
the document, but Solr still indexes the whole thing.


There is also the In-Place Update feature, which is a lot closer to 
localized surgery, as it involves rewriting a portion of the docValues 
file for a segment, not indexing a new document. The field definition 
requirements for this are pretty extreme -- docValues ONLY.  Depending 
on the size of the segment containing the document, this might be slower 
than simply indexing the full document again.


Thanks,
Shawn

SOLR 7.0 DIH out of memory issue with sqlserver

2018-09-18 Thread Tanya Bompi

Hi,
  I have the SOLR 7.0 setup with the DataImportHandler connecting to the
sql server db. I keep getting OutOfMemory: Java Heap Space when doing a
full import. The size of the records is around 3 million so not very huge.
I tried the following steps and nothing helped thus far.

1. Setting the "responseBuffering=adaptive;selectMethod=Cursor" in the jdbc
connection string.
2. Setting the batchSize="-1" which hasnt helped
3. Increasing the heap size at solr startup by issuing the command \solr
start -m 1024m -p 8983
Increasing the heap size further doesnt start SOLR instance itself.

I am wondering what could be causing the issue and how to resolve this.
Below is the data-config :


  
  

Thanks,
Tanya

Re: Command Line Indexer

Uhm, inline:

On 18 September 2018 at 17:05, Dan Brown  wrote:
> 1. Thank you.
>
> 2. I think this is what you're looking for.  You'd be able to be more
> specific than with bin/post.  For instance:
> a. specify the CSV delimiter, CSV quote character, and multivalued field
> delimiter
http://lucene.apache.org/solr/guide/7_4/uploading-data-with-index-handlers.html
separator - (global and field local for multivalued)
encapsulator - for CSV quote characters

> b. the dynamic-fields feature let's you write plugins in Java to define
> values (very simple example: combine field values f_name, m_name, l_name to
> populate a full_name field)
UpdateRequestProcessors. Your example specifically:

> c. specify field order for mapping onto SOLR fields, data types, date
> formats of source data; perhaps your CSV headers/JSON keys don't cleanly
> map to SOLR field names
> d. flag whether the first row of a CSV is the header and should not be
> indexed
> e. use literal values - e.g., instead of having to alter the source data to
> have a column whose value is "foo" you can configure a field to always have
> the same literal value for all documents
> f. set the number of times to retry when there is an error and the amount
> of time between retries (e.g., sometimes zk was not consistently responsive)
> g. skip fields - e.g., your data have 10 columns but you only want to index
> columns 1, 3, 5, and 9
> h. send soft commits after a specified number of batches
> i. combine fields to generate the uniqueKey value
>
> 3. Yes, atomic updates.  For instance, index data using DIH then use this
> index to provide additional values to fields in those documents (e.g.,
> maybe the extra data come from a different data source like BigQuery).
>
> I hope this brings more clarity to this tool's features and answers all
> your questions.  Please ask questions if anyone has more.
>
> Dan
>
>
> On Tue, Sep 18, 2018 at 3:21 PM Christopher Schultz <
> ch...@christopherschultz.net> wrote:
>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> Dan,
>>
>> On 9/18/18 2:51 PM, Dan Brown wrote:
>> > I've been working on this for a while and it's finally in a state
>> > where it's ready for public consumption.
>> >
>> > This is a command line indexer that will index CSV or JSON
>> > documents: https://github.com/likethecolor/solr-indexer
>> >
>> > There are quite a few parameters/options that can be set.
>> >
>> > One thing to note is that it will update individual fields.  That
>> > is, unlike the Data Import Handler, it does not replace entire
>> > documents.
>> >
>> > Please check it out and let me know what you think.
>>
>> How is this different from the bin/post tool that ships with Solr?
>>
>> Or is that you meant when you said "this is unlike the Data Import
>> Handler".
>>
>> AIUI, Solr doesn't support updating a single field in a document. The
>> document is replaced no matter how hard to try to be surgical about
>> updating a single field.
>>
>> - -chris
>> -BEGIN PGP SIGNATURE-
>> Comment: GPGTools - http://gpgtools.org
>> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>>
>> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhXlYACgkQHPApP6U8
>> pFjIeQ/+PRIx+I+IDW9XTqGNV5TIWYf+yQKC/4JpTV4Ndj7MZLsEEw+cfMvFTvQt
>> 44dK7CnDKEDgQHZlMccWKd9/Th1k/5g40VMugBMsayRwUc83Onawdi4HQfnig4et
>> VN0/RaZ/IBo2AThsgEvUNplXYyY3BtyrUt6miiBsVkhKstI/BnmKqZvsRgvVjH0P
>> K1Xc5F2LNyXswvoIZqd3YmEa9p7CYMy7COsFV9KOeSymKlB7UoHulZqpJ9MRYkmn
>> YWjc9dHIRjpz5TUrJqWhZUG03uGXGtTnaXEku1Hb98WyIUZcHxkwN8W7qm6/B0CG
>> inPxfGRFH9EbUdcK4qeXmbQqty2sbKMQ6hogpRd/NEzgSWjDapiEUT1xz+p5V6wG
>> XM0ILaiLJ8zHJA6oUY0w5SNNyhdnd76CDpCK7T7YBm+aIxUDv9zoj6TLNceEaLi0
>> SjfI83LvaR1gM/ZeVO77d+1IY9maU1+5m0EZFjAETfMGj5dwYRvBub0Oo6QQuLUm
>> roF5R5b/bg/WjjPF1n4CJ7gTr/WBMzahKFnnQvoYD3OQqZpoasoEUifPpSd9OgvO
>> yEok0VqwxPeXdHgE+Vy+BlXn6QqshB3BYnUSNbpFXlNsOIQojfJXkjcCa+dP1nyF
>> JCElvmEgBG8K1WzGo4WAtVqJs7WDzQlmY2RDrETGsVbnqkTojXA=
>> =AmkJ
>> -END PGP SIGNATURE-
>>

Re: Command Line Indexer

Oops, premature send.

But basically, nearly all the items below seem to be a mix of things
that CSV can already do or that URP can already do or would be the
good place to inject that as a plugin. E.g.
http://lucene.apache.org/solr/guide/7_4/update-request-processors.html#templateupdateprocessorfactory

Not that I am saying your project has no place to exist. I am just
saying that it would benefit from a higher-level explanation that
clearly differentiates it from what Solr already does.

Regards,
   Alex.

On 18 September 2018 at 17:16, Alexandre Rafalovitch  wrote:
> Uhm, inline:
>
> On 18 September 2018 at 17:05, Dan Brown  wrote:
>> 1. Thank you.
>>
>> 2. I think this is what you're looking for.  You'd be able to be more
>> specific than with bin/post.  For instance:
>> a. specify the CSV delimiter, CSV quote character, and multivalued field
>> delimiter
> http://lucene.apache.org/solr/guide/7_4/uploading-data-with-index-handlers.html
> separator - (global and field local for multivalued)
> encapsulator - for CSV quote characters
>
>> b. the dynamic-fields feature let's you write plugins in Java to define
>> values (very simple example: combine field values f_name, m_name, l_name to
>> populate a full_name field)
> UpdateRequestProcessors. Your example specifically:
>
>> c. specify field order for mapping onto SOLR fields, data types, date
>> formats of source data; perhaps your CSV headers/JSON keys don't cleanly
>> map to SOLR field names
>> d. flag whether the first row of a CSV is the header and should not be
>> indexed
>> e. use literal values - e.g., instead of having to alter the source data to
>> have a column whose value is "foo" you can configure a field to always have
>> the same literal value for all documents
>> f. set the number of times to retry when there is an error and the amount
>> of time between retries (e.g., sometimes zk was not consistently responsive)
>> g. skip fields - e.g., your data have 10 columns but you only want to index
>> columns 1, 3, 5, and 9
>> h. send soft commits after a specified number of batches
>> i. combine fields to generate the uniqueKey value
>>
>> 3. Yes, atomic updates.  For instance, index data using DIH then use this
>> index to provide additional values to fields in those documents (e.g.,
>> maybe the extra data come from a different data source like BigQuery).
>>
>> I hope this brings more clarity to this tool's features and answers all
>> your questions.  Please ask questions if anyone has more.
>>
>> Dan
>>
>>
>> On Tue, Sep 18, 2018 at 3:21 PM Christopher Schultz <
>> ch...@christopherschultz.net> wrote:
>>
>>> -BEGIN PGP SIGNED MESSAGE-
>>> Hash: SHA256
>>>
>>> Dan,
>>>
>>> On 9/18/18 2:51 PM, Dan Brown wrote:
>>> > I've been working on this for a while and it's finally in a state
>>> > where it's ready for public consumption.
>>> >
>>> > This is a command line indexer that will index CSV or JSON
>>> > documents: https://github.com/likethecolor/solr-indexer
>>> >
>>> > There are quite a few parameters/options that can be set.
>>> >
>>> > One thing to note is that it will update individual fields.  That
>>> > is, unlike the Data Import Handler, it does not replace entire
>>> > documents.
>>> >
>>> > Please check it out and let me know what you think.
>>>
>>> How is this different from the bin/post tool that ships with Solr?
>>>
>>> Or is that you meant when you said "this is unlike the Data Import
>>> Handler".
>>>
>>> AIUI, Solr doesn't support updating a single field in a document. The
>>> document is replaced no matter how hard to try to be surgical about
>>> updating a single field.
>>>
>>> - -chris
>>> -BEGIN PGP SIGNATURE-
>>> Comment: GPGTools - http://gpgtools.org
>>> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>>>
>>> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhXlYACgkQHPApP6U8
>>> pFjIeQ/+PRIx+I+IDW9XTqGNV5TIWYf+yQKC/4JpTV4Ndj7MZLsEEw+cfMvFTvQt
>>> 44dK7CnDKEDgQHZlMccWKd9/Th1k/5g40VMugBMsayRwUc83Onawdi4HQfnig4et
>>> VN0/RaZ/IBo2AThsgEvUNplXYyY3BtyrUt6miiBsVkhKstI/BnmKqZvsRgvVjH0P
>>> K1Xc5F2LNyXswvoIZqd3YmEa9p7CYMy7COsFV9KOeSymKlB7UoHulZqpJ9MRYkmn
>>> YWjc9dHIRjpz5TUrJqWhZUG03uGXGtTnaXEku1Hb98WyIUZcHxkwN8W7qm6/B0CG
>>> inPxfGRFH9EbUdcK4qeXmbQqty2sbKMQ6hogpRd/NEzgSWjDapiEUT1xz+p5V6wG
>>> XM0ILaiLJ8zHJA6oUY0w5SNNyhdnd76CDpCK7T7YBm+aIxUDv9zoj6TLNceEaLi0
>>> SjfI83LvaR1gM/ZeVO77d+1IY9maU1+5m0EZFjAETfMGj5dwYRvBub0Oo6QQuLUm
>>> roF5R5b/bg/WjjPF1n4CJ7gTr/WBMzahKFnnQvoYD3OQqZpoasoEUifPpSd9OgvO
>>> yEok0VqwxPeXdHgE+Vy+BlXn6QqshB3BYnUSNbpFXlNsOIQojfJXkjcCa+dP1nyF
>>> JCElvmEgBG8K1WzGo4WAtVqJs7WDzQlmY2RDrETGsVbnqkTojXA=
>>> =AmkJ
>>> -END PGP SIGNATURE-
>>>

Re: Solr 7.2.1 Collection Backup Performance issue

2018-09-18 Thread Ganesh Sethuraman

Thanks for the information. I thought backup is going to be more of the
disk activity. But I understand now that RAM is involved here as well. We
indeed did NOT have enough memory in this box, as it is 64GB box with index
size of 72GB, being backed up. The read (real time GET) performance was
better without BACKUP, could be because there was minimal disk access, but
with Backup running, reads (GET) are probably doing disk read  for every
request.

Thanks,
Ganesh

On Tue, Sep 18, 2018 at 3:43 PM Shawn Heisey  wrote:

> On 9/18/2018 11:00 AM, Ganesh Sethuraman wrote:
> > We are using Solr 7.2.1 with SolrCloud with 35 collections with 1 node ZK
> > ensemble (in lower environment, we will have 3 nodes ensemble) in AWS. We
> > are testing to see if we have Async Solr Cloud backup  (
> > https://lucene.apache.org/solr/guide/7_2/collections-api.html#backup)
> done
> > every time we are create a new collection or update an existing
> collection.
> > There are 1 replica and 8 shards per collection. Two Solr nodes.
> >
> > For the largest collection (index size of 80GB), we see that BACKUP to
> the
> > EFS drive takes about ~10 mins. We are doing lot of /get (real time get)
> > option from the application. We are seeing that that the performance
> > significantly (2x) degrades on the read (get) performance when we BACK-UP
> > is going on in parallel.
>
> My best guess here is that you do not have enough memory. For good
> performance, Solr is extremely reliant on having certain parts of the
> index data sitting in memory, so that it doesn't have to actually read
> the disk to discover matches for a query.  When all is working well,
> that data will be read from memory instead of the disk.  Memory is MUCH
> MUCH faster than a disk.
>
> Making a backup is going to read ALL of the index data.  So if you do
> not have enough spare memory to cache the entire index, reading the
> index to make the backup is going to push the important parts of the
> index out of the cache, and then Solr will have to actually go and read
> the disk in order to satisfy a query.
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#RAM
>
> Can you gather a screenshot of your process list and put it on a file
> sharing website?  You'll find instructions on how to do this here:
>
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue
>
> Thanks,
> Shawn
>
>

Is that solr supports multi version operations?

2018-09-18 Thread zhenyuan wei

Hi all,
add solr document with overwrite=false will keepping multi version
documents，
My question is ：
1.  How to search newest documents？with what options？
2.  How to delete  old version < newest version  documents?

for example:
 {
"id":"1002",
"name":["james"],
"_version_":1611998319085617152,
"name_str":["james"]},
  {
"id":"1002",
"name":["lily"],
"_version_":1611998307815522304,
"name_str":["lily"]},
  {
"id":"1002",
"name":["lucy"],
"_version_":1611998248265842688,
"name_str":["lucy"]}]

1. curl  http://localhost:8983/solr/collection001/query?q=*:*   return all
，
how to search to make response return the newest one？
2. how to delete  document of version
[1611998307815522304，1611998248265842688] ,
which is older then 1611998319085617152.

RE: Modify Schema for Solr Cloud

2018-09-18 Thread Rathor, Piyush (US - Philadelphia)

Thanks Jan.

I used Schema API to do it. 

Thanks & Regards
Piyush Rathor
Consultant
Deloitte Digital (Salesforce.com / Force.com)
Deloitte Consulting Pvt. Ltd.
Office: +1 (615) 209 4980
Mobile : +1 (302) 397 1491
prat...@deloitte.com | www.deloitte.com

Please consider the environment before printing.

-Original Message-
From: Jan Høydahl  
Sent: Tuesday, September 18, 2018 3:22 PM
To: solr-user@lucene.apache.org
Subject: [EXT] Re: Modify Schema for Solr Cloud

Three ways:

1. Use Admin UI Schema tab and add/delete fields/copyfields there. Not support 
for fieldTypes 2. Use Schema API, see ref.guide 3. bin/solr zk cp 
zk:/configs/myconfig/managed-schema .
go ahead edit schema
bin/solr zk cp managed-schema zk:/configs/myconfig/managed-schema
reload collection in UI

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 18. sep. 2018 kl. 07:30 skrev Rathor, Piyush (US - Philadelphia) 
> :
> 
> Hi All,
>  
> I am new to solr cloud.
>  
> Can you please let me know on how to update the schema on solr cloud.
>  
> Thanks & Regards
> Piyush Rathor
> Consultant
> Deloitte Digital (Salesforce.com  / Force.com 
> ) Deloitte Consulting Pvt. Ltd.
> Office: +1 (615) 209 4980
> Mobile : +1 (302) 397 1491
> prat...@deloitte.com  | www.deloitte.com 
> 
> 
> Please consider the environment before printing.
>  
> This message (including any attachments) contains confidential information 
> intended for a specific individual and purpose, and is protected by law. If 
> you are not the intended recipient, you should delete this message and any 
> disclosure, copying, or distribution of this message, or the taking of any 
> action based on it, by you is strictly prohibited.
> 
> v.E.1
>

RE: Modify Schema for Solr Cloud

2018-09-18 Thread Rathor, Piyush (US - Philadelphia)

Thanks Yasufumi.

I will check this option. I used schema API to make the changes.

Thanks & Regards
Piyush Rathor
Consultant
Deloitte Digital (Salesforce.com / Force.com)
Deloitte Consulting Pvt. Ltd.
Office: +1 (615) 209 4980
Mobile : +1 (302) 397 1491
prat...@deloitte.com | www.deloitte.com

Please consider the environment before printing.

-Original Message-
From: Yasufumi Mizoguchi  
Sent: Tuesday, September 18, 2018 4:05 AM
To: solr-user@lucene.apache.org
Subject: [EXT] Re: Modify Schema for Solr Cloud

Hi,

One way is re-upload config files via zkcli.sh and reload the collection.
See following.
https://lucene.apache.org/solr/guide/7_4/command-line-utilities.html

Thanks,
Yasufumi.

2018年9月18日(火) 14:30 Rathor, Piyush (US - Philadelphia) :

> Hi All,
>
>
>
> I am new to solr cloud.
>
>
>
> Can you please let me know on how to update the schema on solr cloud.
>
>
>
> *Thanks & Regards*
>
> *Piyush Rathor*
>
> Consultant
>
> Deloitte Digital (Salesforce.com / Force.com)
>
> Deloitte Consulting Pvt. Ltd.
>
> *Office*: +1 (615) 209 4980
>
> *Mobile *: +1 (302) 397 1491
>
> prat...@deloitte.com | www.deloitte.com
>
> [image: cid:image001.png@01D012F3.6C4D42E0]
>
> Please consider the environment before printing.
>
>
>
> This message (including any attachments) contains confidential 
> information intended for a specific individual and purpose, and is 
> protected by law. If you are not the intended recipient, you should 
> delete this message and any disclosure, copying, or distribution of 
> this message, or the taking of any action based on it, by you is strictly 
> prohibited.
>
> v.E.1
>

Re: SOLR 7.0 DIH out of memory issue with sqlserver


On 9/18/2018 4:48 PM, Tanya Bompi wrote:

   I have the SOLR 7.0 setup with the DataImportHandler connecting to the
sql server db. I keep getting OutOfMemory: Java Heap Space when doing a
full import. The size of the records is around 3 million so not very huge.
I tried the following steps and nothing helped thus far.


See this wiki page:

https://wiki.apache.org/solr/DataImportHandlerFaq

You already have the suggested fix -- setting responseBuffering to 
adaptive.  You might try upgrading the driver.  If that doesn't work, 
you're probably going to need to talk to Microsoft about what you need 
to do differently on the JDBC url.


I did find this page:

https://docs.microsoft.com/en-us/sql/connect/jdbc/using-adaptive-buffering?view=sql-server-2017

This says that when using adaptive buffering, you should avoid using 
selectMethod=cursor.  So you should try removing that parameter.


Thanks,
Shawn

Re: Implementing NeuralNetworkModel RankNet in Solr LTR

2018-09-18 Thread Koji Sekiguchi

Hi,

> https://github.com/airalcorn2/Solr-LTR#RankNet
>
> Has anyone tried on this before? And what is the format of the training
> data that this model requires?

I haven't tried it, but I'd like to inform you that there is another project of LTR we've been 
developed:

https://github.com/LTR4L/ltr4l

It has many LTR algorithms based on neural network, SVM and boosting.

Koji

On 2018/09/12 11:44, Zheng Lin Edwin Yeo wrote:

Hi,

I am working on to implementing Solr LTR in Solr 7.4.0 by using the
NeuralNetworkModel for the feature selection and model training, and I have
found this site which uses RankNet:
https://github.com/airalcorn2/Solr-LTR#RankNet

Has anyone tried on this before? And what is the format of the training
data that this model requires?

Regards,
Edwin

Re: TolerantUpdateProcessorFactory maxErrors=-1 issue

2018-09-18 Thread Derek Poh

In addition, I tried withmaxErrors=3 and with only 1error document, the 
indexing process still gets aborted.


Could it be the way I defined the TolerantUpdateProcessorFactory in 
solrconfg.xml?


On 18/9/2018 3:13 PM, Derek Poh wrote:

Hi

I am using CSV formatted indexupdates to index on tab delimited file.

I have define "TolerantUpdateProcessorFactory" with "maxErrors=-1" in 
the solrconfig.xml to skip any document update error and proceed to 
update the remaining documents without failing.
Howeverit does not seemto be workingas there is an document in the tab 
delimited file withadditional number of fields and this caused the 
indexing to abort instead.


This is how I start the indexing,
curl -o /apps/search/logs/indexing.log 
"http://localhost:8983/solr/$collection/update?update.chain=$updateChainName=true=%09=^=$fieldnames$splitOptions; 
--data-binary "@/apps/search/feed/$csvFilePath/$csvFileName" -H 
'Content-type:application/csv'


This is how the TolerantUpdateProcessorFactory is defined in the 
solrconfig.xml,


  
    P_SupplierId
    P_TradeShowId
    P_ProductId
    id
  
  
    id
    
  
  
 -1
  
  
    
    
    43200
    P_TradeShowOnlineEndDateUTC
  
  
  


Solr version is 6.6.2.

Derek

--
CONFIDENTIALITY NOTICE
This e-mail (including any attachments) may contain confidential 
and/or privileged information. If you are not the intended recipient 
or have received this e-mail in error, please inform the sender 
immediately and delete this e-mail (including any attachments) from 
your computer, and you must not use, disclose to anyone else or copy 
this e-mail (including any attachments), whether in whole or in part.
This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.



--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.

synonyms for Solr Cloud -

2018-09-18 Thread Rathor, Piyush (US - Philadelphia)

Hi All,

How can we add a synonyms text file to solr cloud. I have a text file with 
comma separated synonyms.


Thanks & Regards
Piyush Rathor
Consultant
Deloitte Digital (Salesforce.com / Force.com)
Deloitte Consulting Pvt. Ltd.
Office: +1 (615) 209 4980
Mobile : +1 (302) 397 1491
prat...@deloitte.com | 
www.deloitte.com
[cid:image001.png@01D012F3.6C4D42E0]
Please consider the environment before printing.


This message (including any attachments) contains confidential information 
intended for a specific individual and purpose, and is protected by law. If you 
are not the intended recipient, you should delete this message and any 
disclosure, copying, or distribution of this message, or the taking of any 
action based on it, by you is strictly prohibited.

v.E.1

Re: Is that solr supports multi version operations?

2018-09-18 Thread Walter Underwood

No. Solr only has one version of a document. It is not a multi-version database.

Each replica will return the newest version it has.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 18, 2018, at 7:11 PM, zhenyuan wei  wrote:
> 
> Hi all,
>add solr document with overwrite=false will keepping multi version
> documents，
> My question is ：
>1.  How to search newest documents？with what options？
>2.  How to delete  old version < newest version  documents?
> 
> for example:
> {
>"id":"1002",
>"name":["james"],
>"_version_":1611998319085617152,
>"name_str":["james"]},
>  {
>"id":"1002",
>"name":["lily"],
>"_version_":1611998307815522304,
>"name_str":["lily"]},
>  {
>"id":"1002",
>"name":["lucy"],
>"_version_":1611998248265842688,
>"name_str":["lucy"]}]
> 
> 1. curl  http://localhost:8983/solr/collection001/query?q=*:*   return all
> ，
>how to search to make response return the newest one？
> 2. how to delete  document of version
> [1611998307815522304，1611998248265842688] ,
> which is older then 1611998319085617152.

Re: Is that solr supports multi version operations?

I think if you try hard enough, it is possible to get Solr to keep
multiple documents that would normally keep only the latest version.
They will just have different internal lucene id.

This may of course break a lot of other things like SolrCloud and
possibly facet counts.

So, I would ask the actual business case first. It is entirely
possible that there are other ways to achieve the desired objectives.

Regards,
   Alex.

On 19 September 2018 at 00:17, Walter Underwood  wrote:
> No. Solr only has one version of a document. It is not a multi-version 
> database.
>
> Each replica will return the newest version it has.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>> On Sep 18, 2018, at 7:11 PM, zhenyuan wei  wrote:
>>
>> Hi all,
>>add solr document with overwrite=false will keepping multi version
>> documents，
>> My question is ：
>>1.  How to search newest documents？with what options？
>>2.  How to delete  old version < newest version  documents?
>>
>> for example:
>> {
>>"id":"1002",
>>"name":["james"],
>>"_version_":1611998319085617152,
>>"name_str":["james"]},
>>  {
>>"id":"1002",
>>"name":["lily"],
>>"_version_":1611998307815522304,
>>"name_str":["lily"]},
>>  {
>>"id":"1002",
>>"name":["lucy"],
>>"_version_":1611998248265842688,
>>"name_str":["lucy"]}]
>>
>> 1. curl  http://localhost:8983/solr/collection001/query?q=*:*   return all
>> ，
>>how to search to make response return the newest one？
>> 2. how to delete  document of version
>> [1611998307815522304，1611998248265842688] ,
>> which is older then 1611998319085617152.
>

TolerantUpdateProcessorFactory maxErrors=-1 issue

2018-09-18 Thread Derek Poh


Hi

I am using CSV formatted indexupdates to index on tab delimited file.

I have define "TolerantUpdateProcessorFactory" with "maxErrors=-1" in 
the solrconfig.xml to skip any document update error and proceed to 
update the remaining documents without failing.
Howeverit does not seemto be workingas there is an document in the tab 
delimited file withadditional number of fields and this caused the 
indexing to abort instead.


This is how I start the indexing,
curl -o /apps/search/logs/indexing.log 
"http://localhost:8983/solr/$collection/update?update.chain=$updateChainName=true=%09=^=$fieldnames$splitOptions; 
--data-binary "@/apps/search/feed/$csvFilePath/$csvFileName" -H 
'Content-type:application/csv'


This is how the TolerantUpdateProcessorFactory is defined in the 
solrconfig.xml,


  
    P_SupplierId
    P_TradeShowId
    P_ProductId
    id
  
  
    id
    
  
  
 -1
  
  
    
    
    43200
    P_TradeShowOnlineEndDateUTC
  
  
  


Solr version is 6.6.2.

Derek

--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.

Re: using uuid for documents

2018-09-18 Thread Alfonso Muñoz-Pomer Fuentes

Hi Zahra,

I’m not sure I understand your question. Could you explain with more detail 
what it is that you want to achieve?

> On 18 Sep 2018, at 06:00, Zahra Aminolroaya  wrote:
> 
> Hello Alfonso,
> 
> 
> Thanks. You used the dedupe updateRequestProcessorChain, so for this
> application we cannot use the uuid updateRequestProcessorChain
> individually?!
> 
> 
> Best,
> Zahra
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

--
Alfonso Muñoz-Pomer Fuentes
Senior Lead Software Engineer @ Gene Expression Team
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Tel:+ 44 (0) 1223 49 2633
Skype: amunozpomer

Re: 20180917-Need Apache SOLR support

2018-09-18 Thread zhenyuan wei

I have 6 machines，and each machine run a solr server, each solr server use
RAM 18GB.  Total document number is 3.2billion，1.4TB ，
my collection‘s replica factor is 1。collection shard number is
 60，currently each shard is 20~30GB。
15 fields per document。 Query rate is slow now，maybe 100-500 requests per
second.

Shawn Heisey  于2018年9月18日周二 下午12:07写道：

> On 9/17/2018 9:05 PM, zhenyuan wei wrote:
> > Is that means： Small amount of shards  gains  better performance？
> > I also have a usecase which contains 3 billion documents，the collection
> > contains 60 shard now. Is that 10 shard is better than 60 shard?
>
> There is no definite answer to this question.  It depends on a bunch of
> things.  How big is each shard once it's finally built?  What's your
> query rate?  How many machines do you have, and how much memory do those
> machines have?
>
> Thanks,
> Shawn
>
>

RE: 20180917-Need Apache SOLR support

2018-09-18 Thread Liu, Daphne

You have to increase your RAM. We have upgraded our Solr cluster to  12 solr 
nodes, each with 64G RAM, our shard size is around 25G, each server only hosts 
either one shard ( leading node or replica),  Performance is very good.
For better performance, memory needs to be over your shard size.

Kind regards,

Daphne Liu
BI Architect • Big Data - Matrix SCM

CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 
USA / www.cevalogistics.com
T 904.9281448 / F 904.928.1525 / daphne@cevalogistics.com

Making business flow

-Original Message-
From: zhenyuan wei 
Sent: Tuesday, September 18, 2018 3:12 AM
To: solr-user@lucene.apache.org
Subject: Re: 20180917-Need Apache SOLR support

I have 6 machines，and each machine run a solr server, each solr server use RAM 
18GB.  Total document number is 3.2billion，1.4TB ，
my collection‘s replica factor is 1。collection shard number is
 60，currently each shard is 20~30GB。
15 fields per document。 Query rate is slow now，maybe 100-500 requests per 
second.

Shawn Heisey  于2018年9月18日周二 下午12:07写道：

> On 9/17/2018 9:05 PM, zhenyuan wei wrote:
> > Is that means： Small amount of shards  gains  better performance？
> > I also have a usecase which contains 3 billion documents，the
> > collection contains 60 shard now. Is that 10 shard is better than 60 shard?
>
> There is no definite answer to this question.  It depends on a bunch
> of things.  How big is each shard once it's finally built?  What's
> your query rate?  How many machines do you have, and how much memory
> do those machines have?
>
> Thanks,
> Shawn
>
>

NVOCC Services are provided by CEVA as agents for and on behalf of Pyramid 
Lines Limited trading as Pyramid Lines.
This e-mail message is intended for the above named recipient(s) only. It may 
contain confidential information that is privileged. If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this e-mail and any attachment(s) is strictly 
prohibited. If you have received this e-mail by error, please immediately 
notify the sender by replying to this e-mail and deleting the message including 
any attachment(s) from your system. Thank you in advance for your cooperation 
and assistance. Although the company has taken reasonable precautions to ensure 
no viruses are present in this email, the company cannot accept responsibility 
for any loss or damage arising from the use of this email or attachments.

Re: Does solr support rollback or any method to do the same job？

2018-09-18 Thread Mikhail Khludnev

Surprisingly, you can delete recently added doc not yet committed doc,
Lucene tracks the sequence and after following commit there will be no that
document.
"To rollback" delete/update one need to send original doc. But literally
there'is no fine grained rollback operation.

On Tue, Sep 18, 2018 at 1:04 PM zhenyuan wei  wrote:

> Hi all，
>   Does solr support rollback or any method to do the same job？
> Like update/add/delete a document, can I rollback them？
>
> Best~
> TinsWzy
>

-- 
Sincerely yours
Mikhail Khludnev

Re: Does solr support rollback or any method to do the same job？


On 9/18/2018 4:03 AM, zhenyuan wei wrote:

   Does solr support rollback or any method to do the same job？
Like update/add/delete a document, can I rollback them？


With SolrCloud, rollback is not supported.  This is because a typical 
SolrCloud install spreads the index across multiple servers.  Each 
server has no idea what the other servers are doing as far as indexing.  
So rollback is disabled in cloud mode.


In standalone mode where each index core is a separate entity with no 
connection to any others, you can do a rollback ... but it will NOT be 
the document you just added/deleted/changed, it will be *EVERY* change 
since the last searcher was opened.  Solr does not support transactions 
like a database does.


https://lucene.apache.org/solr/guide/7_4/uploading-data-with-index-handlers.html#rollback-operations

Thanks,
Shawn

Re: user field(uf) not working with Post filter


On 9/5/2018 7:17 AM, shruti suri wrote:

I am using a custom handler with edismax parser. I am using uf parameter in
the handler to restrict some fields from search.  But uf is not working with
post filter(fq). I want to restricted same fields in fq, so that people
could not filter on some fields. Please suggest how can I do that.


What is the fq value?  This has not been shared.

Unless you explicitly request a parser *IN THE FQ PARAMETER VALUE*, the 
filter will use the Lucene parser.  And it will not be a postfilter if 
it is using lucene, dismax, edismax or certain other parsers -- those 
parsers do not implement PostFilter.  The lucene parser does not support 
the uf parameter.


I would say that it is not normal practice to put user input into fq 
without modification.  That should go into q, and there should probably 
be some kind of validation in your code on user input before it gets to 
Solr to look for possible problems.


Adding {!edismax} to the start of the fq value *might* make it honor the 
uf parameter, but I have not tried this.


Thanks,
Shawn

Re: 20180917-Need Apache SOLR support


On 9/18/2018 1:11 AM, zhenyuan wei wrote:

I have 6 machines，and each machine run a solr server, each solr server use
RAM 18GB.  Total document number is 3.2billion，1.4TB ，
my collection‘s replica factor is 1。collection shard number is
  60，currently each shard is 20~30GB。
15 fields per document。 Query rate is slow now，maybe 100-500 requests per
second.


That is NOT a slow query rate.  In the recent past, I was the 
administrator to a Solr install.  When things got *REALLY BUSY*, the 
servers would see as many as five requests per second. Usually the 
request rate was less than one per second.  A high request rate can 
drastically impact overall performance.


I have heard of big Solr installs that handle thousands of requests per 
second, which is certainly larger than yours ... but 100-500 is NOT 
slow.  I'm surprised that you can get acceptable performance on an index 
that big, with that many queries, and only six machines.  Congratulations.


Despite appearances, I wasn't actually asking you for this information.  
I was telling you that those things would all be factors in the decision 
about how many shards you should have.Perhaps I should have worded the 
message differently.


See this page for a discussion about how total memory size and index 
size affect performance:


https://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn

TolerantUpdateProcessorFactory maxErrors settings issue

2018-09-18 Thread Derek Poh


Hi

I am using CSV formatted indexupdates to index on tab delimited file.

I have define "TolerantUpdateProcessorFactory" with "maxErrors=-1" in 
the solrconfig.xml to skip any document update error and proceed to 
update the remaining documents without failing.
Howeverit does not seemto be workingas there is an document in the tab 
delimited file withadditional number of fields and this caused the 
indexing to abort instead.


This is how I start the indexing,
curl -o /apps/search/logs/indexing.log 
"http://localhost:8983/solr/$collection/update?update.chain=$updateChainName=true=%09=^=$fieldnames$splitOptions; 
--data-binary "@/apps/search/feed/$csvFilePath/$csvFileName" -H 
'Content-type:application/csv'


This is how the TolerantUpdateProcessorFactory is defined in the 
solrconfig.xml,


  
    P_SupplierId
    P_TradeShowId
    P_ProductId
    id
  
  
    id
    
  
  
 -1
  
  
    
    
    43200
    P_TradeShowOnlineEndDateUTC
  
  
  


Solr version is 6.6.2.

Derek

--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.

Re: user field(uf) not working with Post filter

2018-09-18 Thread shruti suri

Hi Zheng,

I am using version 6.1.0. Basically, I want few fields to be blocked in fq.

Thanks



-
Regards
Shruti
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

ArrayList cannot be cast to a java.lang.String

2018-09-18 Thread Nathan Friend

Hello,

I'm setting up a new Solr server and am running into an issues I haven't 
experienced in previous Solr installations.  When I navigate to a core's 
"Dataimport" tab (without even triggering an import request), several of the 
HTTP requests made by the admin UI fail.  Checking the Solr logs, I see this 
stacktrace:

java.lang.ClassCastException: java.util.Arrays$ArrayList cannot be cast to 
java.lang.String at 
org.apache.solr.handler.dataimport.RequestInfo.(RequestInfo.java:52) at 
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:131)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503) at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:711) at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:517) at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1629)
 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) 
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) 
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) 
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) 
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)
 at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) 
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) 
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
 at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
 at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) 
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
 at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) 
at org.eclipse.jetty.server.Server.handle(Server.java:530) at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347) at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256) at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
 at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) at 
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124) at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247)
 at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)
 at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
 at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:382)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:708)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626) 
at java.lang.Thread.run(Thread.java:748)

This same error occurs when I begin a data import, or really any HTTP request 
made to the /solr//dataimport* endpoint.

It dug through the source and found the line where this is happening: 
https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/RequestInfo.java#L52

if (requestParams.containsKey("command")) { 
command = (String) requestParams.get("command");
}
My best guess is that the "command" parameter is somehow being included twice 
in request and is coming through as an ArrayList instead of a String.  In my 
solrconfig.xml, I define my "dataimportfull" handler like so:



./DIHconfig.xml
explicit
json
true
full
full-import
true
true
add-unknown-fields-to-the-schema



I tried removing the "command" line, but the error still occurred.

The main difference between this Solr instance and other Solr servers I've set 
up is that this instance is running inside of a Windows Server 2016 Docker 
container, using an image based

[SolrJ Client] Error calling add: connection is still allocated

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

All,

Our single-instance Solr server is just getting its first taste of
production load, and I'm seeing this periodically:

java.lang.IllegalStateException: Connection is still allocated

The stack trace shows it's coming from HTTP Client as called from
within Solr.

We are using SolrJ 7.2.1 and Solr (server) 7.4.0.

Our code looks something like this:

private HashMap CLIENT_REGISTRY = new
HashMap();

synchronized HttpSolrClient getSolrClient(String url)
throws ServiceException, SolrServerException, IOException,
GeneralSecurityException
{
HttpSolrClient solrClient = CLIENT_REGISTRY.get(url);

if(null == solrClient) {
log.info("Creating new HttpSolrClient connected to " + url);

solrClient = new HttpSolrClient.Builder(url)
.withHttpClient(getHttpClient())
.build();

solrClient.ping();

CLIENT_REGISTRY.put(url, solrClient);
}

return solrClient;
}


[here's the code that uses the above]

SolrClient solr = getSolrRegistry().getSolrClient(url);

SolrInputDocument doc = new SolrInputDocument();

// Add stuff to the document

solr.add(doc);
solr.commit();

That's it.

Other than not really needing the "commit" at the end, is there
anything wrong with how we are using SolrJ client? Are instances of
SolrJClient not thread-safe? My assumption was that they were
threadsafe and that HTTP Client would manage the connection pool under
the covers.

Here is the full stack trace:

com.chadis.api.business.RegistrationProcessor- Error processing
registration request
java.lang.IllegalStateException: Connection is still allocated
at org.apache.http.util.Asserts.check(Asserts.java:34)
at
org.apache.http.impl.conn.BasicHttpClientConnectionManager.getConnection
(BasicHttpClientConnectionManager.java:251)
at
org.apache.http.impl.conn.BasicHttpClientConnectionManager$1.get(BasicHt
tpClientConnectionManager.java:202)
at
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.jav
a:191)
at
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:18
5)
at
org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:11
1)
at
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpCli
ent.java:185)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpCli
ent.java:83)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpCli
ent.java:56)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrC
lient.java:542)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.
java:255)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.
java:244)
at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
at
org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173)
at
org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138)
at
org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:152)
at [my code, calling SolrClient.add()]

Any ideas?

Thanks,
- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhFVQACgkQHPApP6U8
pFhGeRAAgrg2GAmwhS9J/RBQC19SnebhevncmgMAF6nHhKegnXr8uv2fGvvySg53
BHCW0N3dtt9ZhI1VB7C9aBO65o/esW5rHi3/sIiY5QRfNIl39ajL8y98RWHJQEeA
mhjoqNdqW/GopA3YaiCmf1YJZ0FsZV7iK04KboD5DRwhsqoa8XVDa44RYfdU4iDP
cleMkQYY2KDSID0gJ2pf/Qj1acwR/hI2Q9+6kxc11/bXKCrWYAmLawV+DH6ZHqLF
HT/7bNNJ+zV0df0WEKzUDQ9wVzTKXkzvYP7ueINIiomyZN7Pv+pF58BaAiICdlUr
aqQMulLcKRC7qmN/5XqBZG00hkbH82n80o5foveTlQlC9yltSTbXjwFqd+FfOH8Y
kBU+mHWkrZr/Ic29LkgLLzX1tG+QoXAgoEAASHOockaTX5oj2vsyFYQ5nVddOMNj
/w1AgdpNztP5DLr1HQ6JhA+3nLZX43GaDxs/nENIOI2Xe36kXfS/so9Cv7DaAjQ8
OkGdOLUksQaukFZ/3MUwbgan5tQYYp4zSmky4RGS7Nd0ePTgvk4pH1uD4NFJnHWK
fsSydLT43tiOWltQkzzby6QcpSg9WrV+0zsnEPQSQHH+ubDbFt03aXS1/tjYAZTF
r8ttwGFfMQLa58hfWwBKMWtyM8m6n9gVMivhp5oENa3uFdo76kQ=
=+WJu
-END PGP SIGNATURE-

Re: [OT] 20180917-Need Apache SOLR support