Nested geofilt query for LTR feature

2019-03-14 Thread Kamuela Lau
Hello,

I'm currently using Solr 7.2.2 and trying to use the LTR contrib module to
rerank queries.
For my LTR model, I would like to use a feature that is essentially a
"normalized distance," a value between 0 and 1 which is based on distance.

When using geodist() to define a feature in the feature store, I received a
"failed to parse feature query" error, and thus I am using the below
geofilt query for distance.

{
  "name":"dist",
  "class":"org.apache.solr.ltr.feature.SolrFeature",
  "params":{"q":"{!geofilt sfield=latlon score=kilometers filter=false
pt=${ltrpt} d=5000}"},
  "store":"ltrFeatureStore"
}

This feature correctly returns the distance between ltrpt and the sfield
latlon (LatLonPointSpatialField).
As I mentioned previously, I would like a feature which uses this distance
in another function. To test this functionality, I tried to define a
feature which multiplies the distance by two:

{
  "name":"twoDist",
  "class":"org.apache.solr.ltr.feature.SolrFeature",
  "params":{"q":"{!func}product(2,query({!geofilt v= sfield=latlon
score=kilometers filter=false pt=${ltrpt} d=5000},0.0))"},
  "store":"ltrFeatureStore"
}

When trying to extract this feature, I receive the following error:

java.lang.RuntimeException: Exception from createWeight for SolrFeature
[name=multDist, params={q={!func}product(2,query({!geofilt v= sfield=latlon
score=kilometers filter=false pt=${ltrpt} d=5000},0.0))}]  missing sfield
for spatial request

However, when I define the following in fl for a regular, non-reranked
query, I find that it is correctly parsed and I receive the correct value,
which is twice the value of geodist() (pt2 is defined in a different part
of the query):
fl=score,geodist(),{!func}product(2,query({!geofilt v= sfield=latlon
score=kilometers filter=false pt=${pt2} d=5},0.0))

For reference, below is what I have defined in my schema:

   


Is this the correct, intended behavior? If so, is my query for this
correct, or should I go about extracting this sort of feature a different
way?


Re: Authorization fails but api still renders

2019-03-14 Thread Zheng Lin Edwin Yeo
Hi,

Can't really catch your question. Are you facing the error 401 on all the
clusters or just one of them?

Also, which Solr version are you using?

Regards,
Edwin

On Fri, 15 Mar 2019 at 05:15, Branham, Jeremy (Experis) 
wrote:

> I’ve discovered the authorization works properly if I use the FQDN to
> access the Solr node, but the short hostname completely circumvents it.
> They are all internal server clusters, so I’m using self-signed
> certificates [the same exact certificate] on each. The SAN portion of the
> cert contains the IP, short, and FQDN of each server.
>
> I also diff’d the two servers Solr installation directories, and confirmed
> they are identical.
> They are using the same exact versions of Java and zookeeper, with the
> same chroot configuration. [different zk clusters]
>
>
> Jeremy Branham
> jb...@allstate.com
>
> On 3/14/19, 10:44 AM, "Branham, Jeremy (Experis)" 
> wrote:
>
> I’m using Basic Auth on 3 different clusters.
> On 2 of the clusters, authorization works fine. A 401 is returned when
> I try to access the core/collection apis.
>
> On the 3rd cluster I can see the authorization failed, but the api
> results are still returned.
>
> Solr.log
> 2019-03-14 09:25:47.680 INFO  (qtp1546693040-152) [   ]
> o.a.s.s.RuleBasedAuthorizationPlugin request has come without principal.
> failed permission {
>   "name":"core-admin-read",
>   "role":"*"}
>
>
> I’m using different zookeeper clusters for each solr cluster, but
> using the same security.json contents.
> I’ve tried refreshing the ZK node, and bringing the whole Solr cluster
> down and back up.
>
> Is there some sort of caching that could be happening?
>
> I wrote an installation script that I’ve used to setup each cluster,
> so I’m thinking I’ll wipe it out and re-run.
> But before I do this, I thought I’d ask the community for input. Maybe
> a bug?
>
>
> Jeremy Branham
> jb...@allstate.com
> Allstate Insurance Company | UCV Technology Services | Information
> Services Group
>
>
>
>


Re: Solr 7.5 DeleteShard not working when all cores are down

2019-03-14 Thread Aroop Ganguly
correction:

Thanks Shalin, Shawn.

I ended up getting guidance from Anshum on this and we did indeed use the 
delete-replica api to delete all but one of the replicas, and bouncing the last 
replica  to let it lead.

I will let anshum share a post on the details of how to recover leaderless 
shards with all replicas inactive in state.

> On Mar 14, 2019, at 8:01 PM, Aroop Ganguly  wrote:
> 
> Thanks Shalin, Shawn.
> 
> I ended up getting guidance from Anshum on this and we did indeed use the 
> delete-replica api to delete all but one of the replicas, and bouncing the 
> last replica  to let it lead.
> 
> I will let anshum share a post on the details of how to recover leader shards.



Re: Solr 7.5 DeleteShard not working when all cores are down

2019-03-14 Thread Shalin Shekhar Mangar
What Shawn said.

DeleteShard API is supposed to be used either when using implicit routing
or when you have compositeId router but the shard has already been split
and therefore in an inactive state.

Delete Replica API is what you need if you want to delete an individual
replica.

On Fri, Mar 15, 2019 at 12:39 AM Shawn Heisey  wrote:

> On 3/14/2019 12:47 PM, Aroop Ganguly wrote:
> > I am trying to delete a shard from a collection using the collections
> > api for the same.
> > On the solr ui,  all the replicas are in “downed” state.
> >
> > However, when I run the delete shard
> >
> command: /solr/admin/collections?action=DELETESHARD=x=shard84
> > I get this exception:
> > {
> >"responseHeader":{
> >  "status":400,
> >  "QTime":14},
> >"Operation deleteshard caused
> >
> exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
>
> > The slice: shard35 is currently active. Only non-active
> > (or custom-hashed) slices can be deleted.",
>
> 
>
> > Why is this api thinking this slice is active ? When the Solr UI shows
> > all replicas down ?
>
> Active means the shard is considered part of the whole collection --
> included when you run a query, etc.
>
> Even though all replicas are down, the shard is still an active part of
> the index.  So you can't delete it.
>
> If your collection is typical and has compositeId routing, deleting a
> shard is really only possible after you have run SPLITSHARD and then you
> will only be able to delete the original shard that gets split.
>
> Aside from SPLITSHARD, I really have no idea how to mark a shard as
> inactive, but that will be required before you can delete it.
>
> Thanks,
> Shawn
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: Bidirectional CDCR not working

2019-03-14 Thread Arnold Bronley
Thanks, Nish. It turned out to be other issue. I had not restarted one of
the node in the cluster which had become leader meanwhile.
It is good to know though that there is malformed XML in the example. I
will try to submit a document fix soon.

On Thu, Mar 14, 2019 at 5:37 PM Nish Karve  wrote:

> Arnold,
>
> Have you copied the configuration from the Solr docs? The bi directional
> cluster configuration (for cluster 1) has a malformed XML. It is missing
> the closing tag for the updateLogSynchronizer under the request handler
> configuration.
>
> Please disregard if you have already considered that in your configuration.
> I had a lot of issues trying to figure out the issue when I realized that
> it was a documentation error.
>
> Thanks
> Nishant
>
>
> On Thu, Mar 14, 2019, 2:54 PM Arnold Bronley  wrote:
>
> > Configuration is almost identical for both clusters in terms of cdcr
> except
> > for zkHost parameter configuration.
> >
> > On Thu, Mar 14, 2019 at 3:45 PM Arnold Bronley 
> > wrote:
> >
> > > Exactly. I have it defined in both clusters. I am following the
> > > instructions from here .
> > >
> >
> https://lucene.apache.org/solr/guide/7_7/cdcr-config.html#bi-directional-updates
> > >
> > > On Thu, Mar 14, 2019 at 3:40 PM Amrit Sarkar 
> > > wrote:
> > >
> > >> Hi Arnold,
> > >>
> > >> You need "cdcr-processor-chain" definitions in solrconfig.xml on both
> > >> clusters' collections. Both clusters need to act as source and target.
> > >>
> > >> Amrit Sarkar
> > >> Search Engineer
> > >> Lucidworks, Inc.
> > >> 415-589-9269
> > >> www.lucidworks.com
> > >> Twitter http://twitter.com/lucidworks
> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > >> Medium: https://medium.com/@sarkaramrit2
> > >>
> > >>
> > >> On Fri, Mar 15, 2019 at 1:03 AM Arnold Bronley <
> arnoldbron...@gmail.com
> > >
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > I used unidirectional CDCR in SolrCloud (7.7.1) without any issues.
> > But
> > >> > after setting up bidirectional cdcr configuration, I am not able to
> > >> index a
> > >> > document.
> > >> >
> > >> > Following is the error that I am getting:
> > >> >
> > >> > Async exception during distributed update: Error from server at
> > >> > http://host1:8983/solr/techproducts_shard2_replica_n6: Bad Request
> > >> > request:
> > >> > http://host1
> > >> >
> > >> >
> > >>
> >
> :8983/solr/techproducts_shard2_replica_n6/update?update.chain=cdcr-processor-chain=TOLEADER=
> > >> >
> > >>
> >
> http://host2:8983/solr/techproducts_shard1_replica_n1=javabin=2
> > >> > Remote error message: unknown UpdateRequestProcessorChain:
> > >> > cdcr-processor-chain
> > >> >
> > >> > Do you know why I might be getting this error?
> > >> >
> > >>
> > >
> >
>


Re: Solr/Tika config question

2019-03-14 Thread Erick Erickson
Tika is already distributed with Solr. It should “just work” since the path is 
already in solrconfig.xml
 

Other PDF converters? I’m sure there are, but Tika is free….

But, i wouldn’t really recommend that you just ship the docs to Solr, I’d 
recommend that you build a little program to do the extraction on one or more 
clients, the details of why are here:

https://lucidworks.com/2012/02/14/indexing-with-solrj/

It’s a little old, but the concepts are still valid. The RDBMS parts you can 
just rip out.

Best,
Erick

> On Mar 14, 2019, at 2:53 PM, Paul Buiocchi  wrote:
> 
> Greetings,
> I am setting up solr 8 on a vanilla Linux Ubuntu server (16.04)
> The whole reason for the setup is to index 1000s of PDF files (newspaper 
> scans).
> - I created my core and have Solr up and running.- I am assuming that I need 
> Apache Tika to index the files-Do I tie Tika into Solr via the SOLCONFIG.XML 
> file ?-if so , does anyone have a sample syntax ?-Tika.jar ? Server or client 
> ?
> - are there other PDF converters other than Tika . if so how do they compare.
> Any other advice /suggestions 
> Thank you all , I really appreciate the help !
> 
> Sent from Yahoo Mail on Android



Solr/Tika config question

2019-03-14 Thread Paul Buiocchi
Greetings,
I am setting up solr 8 on a vanilla Linux Ubuntu server (16.04)
The whole reason for the setup is to index 1000s of PDF files (newspaper scans).
- I created my core and have Solr up and running.- I am assuming that I need 
Apache Tika to index the files-Do I tie Tika into Solr via the SOLCONFIG.XML 
file ?-if so , does anyone have a sample syntax ?-Tika.jar ? Server or client ?
- are there other PDF converters other than Tika . if so how do they compare.
Any other advice /suggestions 
Thank you all , I really appreciate the help !

Sent from Yahoo Mail on Android

Re: Bidirectional CDCR not working

2019-03-14 Thread Nish Karve
Arnold,

Have you copied the configuration from the Solr docs? The bi directional
cluster configuration (for cluster 1) has a malformed XML. It is missing
the closing tag for the updateLogSynchronizer under the request handler
configuration.

Please disregard if you have already considered that in your configuration.
I had a lot of issues trying to figure out the issue when I realized that
it was a documentation error.

Thanks
Nishant


On Thu, Mar 14, 2019, 2:54 PM Arnold Bronley  Configuration is almost identical for both clusters in terms of cdcr except
> for zkHost parameter configuration.
>
> On Thu, Mar 14, 2019 at 3:45 PM Arnold Bronley 
> wrote:
>
> > Exactly. I have it defined in both clusters. I am following the
> > instructions from here .
> >
> https://lucene.apache.org/solr/guide/7_7/cdcr-config.html#bi-directional-updates
> >
> > On Thu, Mar 14, 2019 at 3:40 PM Amrit Sarkar 
> > wrote:
> >
> >> Hi Arnold,
> >>
> >> You need "cdcr-processor-chain" definitions in solrconfig.xml on both
> >> clusters' collections. Both clusters need to act as source and target.
> >>
> >> Amrit Sarkar
> >> Search Engineer
> >> Lucidworks, Inc.
> >> 415-589-9269
> >> www.lucidworks.com
> >> Twitter http://twitter.com/lucidworks
> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> Medium: https://medium.com/@sarkaramrit2
> >>
> >>
> >> On Fri, Mar 15, 2019 at 1:03 AM Arnold Bronley  >
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > I used unidirectional CDCR in SolrCloud (7.7.1) without any issues.
> But
> >> > after setting up bidirectional cdcr configuration, I am not able to
> >> index a
> >> > document.
> >> >
> >> > Following is the error that I am getting:
> >> >
> >> > Async exception during distributed update: Error from server at
> >> > http://host1:8983/solr/techproducts_shard2_replica_n6: Bad Request
> >> > request:
> >> > http://host1
> >> >
> >> >
> >>
> :8983/solr/techproducts_shard2_replica_n6/update?update.chain=cdcr-processor-chain=TOLEADER=
> >> >
> >>
> http://host2:8983/solr/techproducts_shard1_replica_n1=javabin=2
> >> > Remote error message: unknown UpdateRequestProcessorChain:
> >> > cdcr-processor-chain
> >> >
> >> > Do you know why I might be getting this error?
> >> >
> >>
> >
>


Re: Commits and new document visibility

2019-03-14 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 3/14/19 10:46, Shawn Heisey wrote:
> On 3/14/2019 8:23 AM, Christopher Schultz wrote:
>> I believe that the only thing I want to do is to set the 
>> autoSoftCommit value to something "reasonable". I'll probably
>> start with maybe 15000 (15sec) to match the hard-commit setting
>> and see if we get any complaints about delays between "save" and
>> "seeing the user".
> 
> In my opinion, 15 seconds is far too frequent for opening a new 
> searcher.  If the index reaches any real size, you may be in a
> situation where the full soft commit takes longer than 15 seconds
> to complete - mostly due to warming or autowarming.  Commits that
> open a searcher can be very resource-intensive ... if they happen
> too frequently, then heavy indexing will cause your Solr instance
> to never "calm down" ... it will always be hitting the CPU and disk
> hard. I'd personally start with one minute and adjust from there
> based on how long the commits take.
Okay. Current core size is ~1M documents. I think users can live with
a 1-minute delay, but I'll have to ask :)

Is the log file the best resource for information on (soft)
commit-duration?

>> In our case, we don't have a huge number of documents being
>> created in a minute. Probably once per minute, if that.
>> 
>> Does that seem reasonable?
>> 
>> As for actually SETTING the setting, I'd prefer not to edit the 
>> solrconfig.xml document. Instead, can I set this in my
>> solr.in.sh script? I see an example like this right in the file:
>> 
>> SOLR_OPTS="$SOLR_OPTS -Dsolr.autoSoftCommit.maxTime=3000"
> 
> 3 seconds is even more problematic than 15.

Sorry, that was just a copy/paste directly from the default solr.in.sh
script that ships with Solr. I wouldn't do a 3-second soft-commit.

> I believe that when you use "bin/solr create" to create an index
> with the default config, that it does set the autoSoftCommit to 3
> seconds. Which as I stated, I believe to be far too frequent.

Nope, it sets it to "never soft commit", unless the defaults have
changed since I built this service with, I think, 7.3.0.

Is there any way to change this value at runtime, or does it require a
service-restart?

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlyKxt8ACgkQHPApP6U8
pFg9ChAAkSgsvn3+xufyLM9bA8WIWqICwmDWRdFM9nbSiy4bDH1Zl/86FKjzcvbB
lmyVFYlpFGedcSKLVsqXGEZiu8n0YgR6iVw6udfIJOWzex5JkwUBUsmS6bHP5ZAj
8wkTyWPyBQVBSBWUxQnEzfrgJCFxzEbzBt8no0gt0f7vbgXm+HaFBkb+l2MQzTK9
wrhsLh36cb17ig+/w16Eo4Rq5VQ5f/P4Y7PkTfzS5CaWyPi16mTP8Z7vTxQ+ltHQ
IPAVnZ4U6Tx4hFxf2Ox99qRX5wAlX0lMD063Gx7Q348Xn+u8VH8Aur8hudnb9Icf
MK9OqU0bxdeWkhDxGDCuxY4h+t+kE1YI0cPI5KWTkBVAU24dCOAPkJQ0LMGs/rGR
B3KareFltLztowvM8rxOeNcLzeoKn1ZpWrtPuK9tuaCy9LnwxgfTOGJFRuzhzxPF
WHA7R4LtQrjjmAXV1a/BgkNVXXmGnq1qJNyICiV6nYS/ALJXKidrexgcyJ4FoWK4
uEcy/62mtbTVz7I4mdmkNH/vwjjOTxZy2FXfwoUIQYe9R2RHM9NbF0Fzzrvx3hQH
vp2GD+AhzhIQUuqBe50XqUkC0T199ZgR4YkCBX7LdPDPcv54QgAfgjfImidQAiqn
s+i/J/rBFZPTD2vAgix+A74UNpePrKhODt0GNg92J4NvTU8P9kM=
=FwiA
-END PGP SIGNATURE-


Re: Authorization fails but api still renders

2019-03-14 Thread Branham, Jeremy (Experis)
I’ve discovered the authorization works properly if I use the FQDN to access 
the Solr node, but the short hostname completely circumvents it.
They are all internal server clusters, so I’m using self-signed certificates 
[the same exact certificate] on each. The SAN portion of the cert contains the 
IP, short, and FQDN of each server.

I also diff’d the two servers Solr installation directories, and confirmed they 
are identical.
They are using the same exact versions of Java and zookeeper, with the same 
chroot configuration. [different zk clusters]

 
Jeremy Branham
jb...@allstate.com

On 3/14/19, 10:44 AM, "Branham, Jeremy (Experis)"  wrote:

I’m using Basic Auth on 3 different clusters.
On 2 of the clusters, authorization works fine. A 401 is returned when I 
try to access the core/collection apis.

On the 3rd cluster I can see the authorization failed, but the api results 
are still returned.

Solr.log
2019-03-14 09:25:47.680 INFO  (qtp1546693040-152) [   ] 
o.a.s.s.RuleBasedAuthorizationPlugin request has come without principal. failed 
permission {
  "name":"core-admin-read",
  "role":"*"}


I’m using different zookeeper clusters for each solr cluster, but using the 
same security.json contents.
I’ve tried refreshing the ZK node, and bringing the whole Solr cluster down 
and back up.

Is there some sort of caching that could be happening?

I wrote an installation script that I’ve used to setup each cluster, so I’m 
thinking I’ll wipe it out and re-run.
But before I do this, I thought I’d ask the community for input. Maybe a 
bug?


Jeremy Branham
jb...@allstate.com
Allstate Insurance Company | UCV Technology Services | Information Services 
Group





Re: Help with a DIH config file

2019-03-14 Thread Jörn Franke
sorry for my late reply. thanks for sharing

yes this is possible.

maybe my last mail were confusing. I hope the examples below help

Alternative 1 - Use only DIH without update processor
tika-data-config-2xml - add transformer in entity and the transformation in
field (here done for id and for fulltext) - additioanlly set
TikaEntityProcessor format to "text":























Alternative 2 - Regex processor in solrconfig.xml - you need to put
everything into ONE chain

  _text_ fulltext


_text_
fulltext
\n|\r

true



id
url
[^\w|\.]
/
true






[..]


tika-data-config-2.xml
my-chain



On Thu, Mar 14, 2019 at 6:41 AM wclarke  wrote:

> Got each one working individually, but not multiples.  Is it possible?
> Please see attached files.
>
> Thanks!!! tika-data-config-2.xml
> 
> solrconfig.xml
> 
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Bidirectional CDCR not working

2019-03-14 Thread Arnold Bronley
Configuration is almost identical for both clusters in terms of cdcr except
for zkHost parameter configuration.

On Thu, Mar 14, 2019 at 3:45 PM Arnold Bronley 
wrote:

> Exactly. I have it defined in both clusters. I am following the
> instructions from here .
> https://lucene.apache.org/solr/guide/7_7/cdcr-config.html#bi-directional-updates
>
> On Thu, Mar 14, 2019 at 3:40 PM Amrit Sarkar 
> wrote:
>
>> Hi Arnold,
>>
>> You need "cdcr-processor-chain" definitions in solrconfig.xml on both
>> clusters' collections. Both clusters need to act as source and target.
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> Medium: https://medium.com/@sarkaramrit2
>>
>>
>> On Fri, Mar 15, 2019 at 1:03 AM Arnold Bronley 
>> wrote:
>>
>> > Hi,
>> >
>> > I used unidirectional CDCR in SolrCloud (7.7.1) without any issues. But
>> > after setting up bidirectional cdcr configuration, I am not able to
>> index a
>> > document.
>> >
>> > Following is the error that I am getting:
>> >
>> > Async exception during distributed update: Error from server at
>> > http://host1:8983/solr/techproducts_shard2_replica_n6: Bad Request
>> > request:
>> > http://host1
>> >
>> >
>> :8983/solr/techproducts_shard2_replica_n6/update?update.chain=cdcr-processor-chain=TOLEADER=
>> >
>> http://host2:8983/solr/techproducts_shard1_replica_n1=javabin=2
>> > Remote error message: unknown UpdateRequestProcessorChain:
>> > cdcr-processor-chain
>> >
>> > Do you know why I might be getting this error?
>> >
>>
>


Re: Bidirectional CDCR not working

2019-03-14 Thread Arnold Bronley
Exactly. I have it defined in both clusters. I am following the
instructions from here .
https://lucene.apache.org/solr/guide/7_7/cdcr-config.html#bi-directional-updates

On Thu, Mar 14, 2019 at 3:40 PM Amrit Sarkar  wrote:

> Hi Arnold,
>
> You need "cdcr-processor-chain" definitions in solrconfig.xml on both
> clusters' collections. Both clusters need to act as source and target.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
>
> On Fri, Mar 15, 2019 at 1:03 AM Arnold Bronley 
> wrote:
>
> > Hi,
> >
> > I used unidirectional CDCR in SolrCloud (7.7.1) without any issues. But
> > after setting up bidirectional cdcr configuration, I am not able to
> index a
> > document.
> >
> > Following is the error that I am getting:
> >
> > Async exception during distributed update: Error from server at
> > http://host1:8983/solr/techproducts_shard2_replica_n6: Bad Request
> > request:
> > http://host1
> >
> >
> :8983/solr/techproducts_shard2_replica_n6/update?update.chain=cdcr-processor-chain=TOLEADER=
> >
> http://host2:8983/solr/techproducts_shard1_replica_n1=javabin=2
> > Remote error message: unknown UpdateRequestProcessorChain:
> > cdcr-processor-chain
> >
> > Do you know why I might be getting this error?
> >
>


Re: ExactStatsCache not working for distributed IDF

2019-03-14 Thread Arnold Bronley
Hi,

I tried that as well. No change in scores.

On Thu, Mar 14, 2019 at 3:37 PM Michael Gibney 
wrote:

> Are you basing your conclusion (that it's not working as expected) on the
> scores as reported in the debug output? If you haven't already, try adding
> "score" to the "fl" param -- if different (for a given doc) than the score
> as reported in debug, then it's probably working as intended ... just a
> little confusing in the debug output.
>
> On Thu, Mar 14, 2019 at 3:23 PM Arnold Bronley 
> wrote:
>
> > Hi,
> >
> > I am using ExactStatsCache in SolrCloud (7.7.1) by adding following to
> > solrconfig.xml file for all collections. I restarted and indexed the
> > documents of all collections after this change just to be sure.
> >
> > 
> >
> > However, when I do multi-collection query, the scores do not change
> before
> > and after adding ExactStatsCache. I can still see the docCount in debug
> > output coming from individual shards and not even from whole collection.
> I
> > was expecting that the docCount would be of addition of all docCounts of
> > all collections included in search query.
> >
> > Do you know what I might be doing wrong?
> >
>


Re: Bidirectional CDCR not working

2019-03-14 Thread Amrit Sarkar
Hi Arnold,

You need "cdcr-processor-chain" definitions in solrconfig.xml on both
clusters' collections. Both clusters need to act as source and target.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Fri, Mar 15, 2019 at 1:03 AM Arnold Bronley 
wrote:

> Hi,
>
> I used unidirectional CDCR in SolrCloud (7.7.1) without any issues. But
> after setting up bidirectional cdcr configuration, I am not able to index a
> document.
>
> Following is the error that I am getting:
>
> Async exception during distributed update: Error from server at
> http://host1:8983/solr/techproducts_shard2_replica_n6: Bad Request
> request:
> http://host1
>
> :8983/solr/techproducts_shard2_replica_n6/update?update.chain=cdcr-processor-chain=TOLEADER=
> http://host2:8983/solr/techproducts_shard1_replica_n1=javabin=2
> Remote error message: unknown UpdateRequestProcessorChain:
> cdcr-processor-chain
>
> Do you know why I might be getting this error?
>


Re: ExactStatsCache not working for distributed IDF

2019-03-14 Thread Michael Gibney
Are you basing your conclusion (that it's not working as expected) on the
scores as reported in the debug output? If you haven't already, try adding
"score" to the "fl" param -- if different (for a given doc) than the score
as reported in debug, then it's probably working as intended ... just a
little confusing in the debug output.

On Thu, Mar 14, 2019 at 3:23 PM Arnold Bronley 
wrote:

> Hi,
>
> I am using ExactStatsCache in SolrCloud (7.7.1) by adding following to
> solrconfig.xml file for all collections. I restarted and indexed the
> documents of all collections after this change just to be sure.
>
> 
>
> However, when I do multi-collection query, the scores do not change before
> and after adding ExactStatsCache. I can still see the docCount in debug
> output coming from individual shards and not even from whole collection. I
> was expecting that the docCount would be of addition of all docCounts of
> all collections included in search query.
>
> Do you know what I might be doing wrong?
>


Bidirectional CDCR not working

2019-03-14 Thread Arnold Bronley
Hi,

I used unidirectional CDCR in SolrCloud (7.7.1) without any issues. But
after setting up bidirectional cdcr configuration, I am not able to index a
document.

Following is the error that I am getting:

Async exception during distributed update: Error from server at
http://host1:8983/solr/techproducts_shard2_replica_n6: Bad Request request:
http://host1
:8983/solr/techproducts_shard2_replica_n6/update?update.chain=cdcr-processor-chain=TOLEADER=
http://host2:8983/solr/techproducts_shard1_replica_n1=javabin=2
Remote error message: unknown UpdateRequestProcessorChain:
cdcr-processor-chain

Do you know why I might be getting this error?


ExactStatsCache not working for distributed IDF

2019-03-14 Thread Arnold Bronley
Hi,

I am using ExactStatsCache in SolrCloud (7.7.1) by adding following to
solrconfig.xml file for all collections. I restarted and indexed the
documents of all collections after this change just to be sure.



However, when I do multi-collection query, the scores do not change before
and after adding ExactStatsCache. I can still see the docCount in debug
output coming from individual shards and not even from whole collection. I
was expecting that the docCount would be of addition of all docCounts of
all collections included in search query.

Do you know what I might be doing wrong?


Re: Solr 7.5 DeleteShard not working when all cores are down

2019-03-14 Thread Shawn Heisey

On 3/14/2019 12:47 PM, Aroop Ganguly wrote:
I am trying to delete a shard from a collection using the collections 
api for the same.

On the solr ui,  all the replicas are in “downed” state.

However, when I run the delete shard 
command: /solr/admin/collections?action=DELETESHARD=x=shard84

I get this exception:
{
   "responseHeader":{
     "status":400,
     "QTime":14},
   "Operation deleteshard caused 
exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
The slice: shard35 is currently active. Only non-active 
(or custom-hashed) slices can be deleted.",




Why is this api thinking this slice is active ? When the Solr UI shows 
all replicas down ?


Active means the shard is considered part of the whole collection -- 
included when you run a query, etc.


Even though all replicas are down, the shard is still an active part of 
the index.  So you can't delete it.


If your collection is typical and has compositeId routing, deleting a 
shard is really only possible after you have run SPLITSHARD and then you 
will only be able to delete the original shard that gets split.


Aside from SPLITSHARD, I really have no idea how to mark a shard as 
inactive, but that will be required before you can delete it.


Thanks,
Shawn


Solr 7.5 DeleteShard not working when all cores are down

2019-03-14 Thread Aroop Ganguly
Hi All

I am trying to delete a shard from a collection using the collections api for 
the same.
On the solr ui,  all the replicas are in “downed” state. 

However, when I run the delete shard command: 
/solr/admin/collections?action=DELETESHARD=x=shard84
I get this exception:
{
  "responseHeader":{
"status":400,
"QTime":14},
  "Operation deleteshard caused 
exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
 The slice: shard35 is currently active. Only non-active (or custom-hashed) 
slices can be deleted.",
  "exception":{
"msg":"The slice: shard35 is currently active. Only non-active (or 
custom-hashed) slices can be deleted.",
"rspCode":400},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"The slice: shard84 is currently active. Only non-active (or 
custom-hashed) slices can be deleted.",
"code":400}}


Why is this api thinking this slice is active ? When the Solr UI shows all 
replicas down ?



Thanks
Aroop



Re: Boolean Searches?

2019-03-14 Thread David Hastings
oh, thought it was implied with this:
" and also use the edismax query parser"



On Thu, Mar 14, 2019 at 11:38 AM Andy C  wrote:

> Dave,
>
> You don't mention what query parser you are using, but with the default
> query parser you can field qualify all the terms entered in a text box by
> surrounding them with parenthesis. So if you want to search against the
> 'title' field and they entered:
>
> train OR dragon
>
> You could generate the Solr query:
>
> title:(train OR dragon)
>
> Historically however Solr has not processed queries that contain a mixture
> of boolean operators as expected. The problem is described here:
> http://robotlibrarian.billdueber.com/2011/12/solr-and-boolean-operators/
>
> There is an open JIRA for this (
> https://issues.apache.org/jira/browse/SOLR-4023) so I assume the problem
> still exists in the most recent releases.
>
> On Thu, Mar 14, 2019 at 10:50 AM Dave Beckstrom 
> wrote:
>
> > Hi Everyone,
> >
> > I'm building a SOLR search application and the customer wants the search
> to
> > work like google search.
> >
> >
> > They want the user to be able to enter boolean searches like:
> >
> > train OR dragon.
> >
> > which would find any matches that has the word "train" or the word
> "dragon"
> > in the title.
> >
> > I know that the SOLR search would like this:
> >
> > title:train OR title:dragon
> >
> > I am trying to avoid having to parse through what the user enters and
> build
> > out complex search strings.
> >
> > Is there any way that I can build a search against the "title" field
> where
> > if the user enters something like:
> >
> > train OR dragon AND 2
> >
> > it will hour the boolean AND/OR logic without my having to convert it
> into
> > somethng nasty like:
> >
> > title:train OR title:dragon AND title:2
> >
> >
> > Thank you!
> >
> > --
> > *Fig Leaf Software, Inc.*
> > https://www.figleaf.com/
> > 
> >
> > Full-Service Solutions Integrator
> >
> >
> >
> >
> >
> >
> >
>


RE: Duplicate values in Multi Value Fields

2019-03-14 Thread Gerald Bonfiglio
I've used this before, by specifying the chain as the default processor chain 
by putting the following directly under the  entry:

  

  uniq-fields

  

Not sure if this is the best way, but since our app is the only one using Solr, 
we want every update to use the chain across all our collections.

-Original Message-
From: Alexis Aravena Silva [mailto:aarav...@itsofteg.com]
Sent: Thursday, March 14, 2019 11:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Duplicate values in Multi Value Fields

Does anyone know how to config this in solrconfig?, the idea is that solr uses 
it when I execute the data import:





  _nombreArea_


  




  uniq-fields

  






From: Alexis Aravena Silva
Sent: Thursday, March 14, 2019 11:26:07 AM
To: solr-user@lucene.apache.org
Subject: Re: Duplicate values in Multi Value Fields


I've tried with the following, but it doesn't work, it seems like solr doesn't 
take the configuration:





  _nombreArea_


  




  uniq-fields

  




From: MUNENDRA S.N 
Sent: Thursday, March 14, 2019 11:17:40 AM
To: solr-user@lucene.apache.org
Subject: Re: Duplicate values in Multi Value Fields

Probably you could add-distinct operation for unique values in multivalued
fields

https://lucene.apache.org/solr/guide/7_3/updating-parts-of-documents.html

On Thu, Mar 14, 2019, 7:40 PM Jörn Franke  wrote:

> With an update request processor
>
> https://lucene.apache.org/solr/7_4_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html
>
> > Am 14.03.2019 um 15:01 schrieb Alexis Aravena Silva <
> aarav...@itsofteg.com>:
> >
> > Hello,
> >
> >
> > I'm indexing data into some MultiValueFields, but  I have duplicates,
> how can I remove the duplicate values at indexing time?
> >
> >
> > I'm using Solr 7.
> >
> >
> > sample:
> >
> >
> > _nombreArea_":["MICROBIOLOGÍA", "QUÍMICA", "MICROBIOLOGÍA",
> "MICROBIOLOGÍA", "MICROBIOLOGÍA", "QUÍMICA", "QUÍMICA", "MICROBIOLOGÍA",
> "MICROBIOLOGÍA", "QUÍMICA", "MICROBIOLOGÍA", "QUÍMICA"],
> >
> >
> > Regards,
> >
> > Alexis Aravena S.
> >
> >
>




[Nastel  Technologies]

The information contained in this e-mail and in any attachment is confidential 
and
is intended solely for the use of the individual or entity to which it is 
addressed.
Access, copying, disclosure or use of such information by anyone else is 
unauthorized.
If you are not the intended recipient, please delete the e-mail and refrain 
from use of such information.


Re: Boolean Searches?

2019-03-14 Thread Andy C
Dave,

You don't mention what query parser you are using, but with the default
query parser you can field qualify all the terms entered in a text box by
surrounding them with parenthesis. So if you want to search against the
'title' field and they entered:

train OR dragon

You could generate the Solr query:

title:(train OR dragon)

Historically however Solr has not processed queries that contain a mixture
of boolean operators as expected. The problem is described here:
http://robotlibrarian.billdueber.com/2011/12/solr-and-boolean-operators/

There is an open JIRA for this (
https://issues.apache.org/jira/browse/SOLR-4023) so I assume the problem
still exists in the most recent releases.

On Thu, Mar 14, 2019 at 10:50 AM Dave Beckstrom 
wrote:

> Hi Everyone,
>
> I'm building a SOLR search application and the customer wants the search to
> work like google search.
>
>
> They want the user to be able to enter boolean searches like:
>
> train OR dragon.
>
> which would find any matches that has the word "train" or the word "dragon"
> in the title.
>
> I know that the SOLR search would like this:
>
> title:train OR title:dragon
>
> I am trying to avoid having to parse through what the user enters and build
> out complex search strings.
>
> Is there any way that I can build a search against the "title" field where
> if the user enters something like:
>
> train OR dragon AND 2
>
> it will hour the boolean AND/OR logic without my having to convert it into
> somethng nasty like:
>
> title:train OR title:dragon AND title:2
>
>
> Thank you!
>
> --
> *Fig Leaf Software, Inc.*
> https://www.figleaf.com/
> 
>
> Full-Service Solutions Integrator
>
>
>
>
>
>
>


Authorization fails but api still renders

2019-03-14 Thread Branham, Jeremy (Experis)
I’m using Basic Auth on 3 different clusters.
On 2 of the clusters, authorization works fine. A 401 is returned when I try to 
access the core/collection apis.

On the 3rd cluster I can see the authorization failed, but the api results are 
still returned.

Solr.log
2019-03-14 09:25:47.680 INFO  (qtp1546693040-152) [   ] 
o.a.s.s.RuleBasedAuthorizationPlugin request has come without principal. failed 
permission {
  "name":"core-admin-read",
  "role":"*"}


I’m using different zookeeper clusters for each solr cluster, but using the 
same security.json contents.
I’ve tried refreshing the ZK node, and bringing the whole Solr cluster down and 
back up.

Is there some sort of caching that could be happening?

I wrote an installation script that I’ve used to setup each cluster, so I’m 
thinking I’ll wipe it out and re-run.
But before I do this, I thought I’d ask the community for input. Maybe a bug?


Jeremy Branham
jb...@allstate.com
Allstate Insurance Company | UCV Technology Services | Information Services 
Group



Re: Duplicate values in Multi Value Fields

2019-03-14 Thread Alexis Aravena Silva
Does anyone know how to config this in solrconfig?, the idea is that solr uses 
it when I execute the data import:





  _nombreArea_


  




  uniq-fields

  






From: Alexis Aravena Silva
Sent: Thursday, March 14, 2019 11:26:07 AM
To: solr-user@lucene.apache.org
Subject: Re: Duplicate values in Multi Value Fields


I've tried with the following, but it doesn't work, it seems like solr doesn't 
take the configuration:





  _nombreArea_


  




  uniq-fields

  




From: MUNENDRA S.N 
Sent: Thursday, March 14, 2019 11:17:40 AM
To: solr-user@lucene.apache.org
Subject: Re: Duplicate values in Multi Value Fields

Probably you could add-distinct operation for unique values in multivalued
fields

https://lucene.apache.org/solr/guide/7_3/updating-parts-of-documents.html

On Thu, Mar 14, 2019, 7:40 PM Jörn Franke  wrote:

> With an update request processor
>
> https://lucene.apache.org/solr/7_4_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html
>
> > Am 14.03.2019 um 15:01 schrieb Alexis Aravena Silva <
> aarav...@itsofteg.com>:
> >
> > Hello,
> >
> >
> > I'm indexing data into some MultiValueFields, but  I have duplicates,
> how can I remove the duplicate values at indexing time?
> >
> >
> > I'm using Solr 7.
> >
> >
> > sample:
> >
> >
> > _nombreArea_":["MICROBIOLOGÍA", "QUÍMICA", "MICROBIOLOGÍA",
> "MICROBIOLOGÍA", "MICROBIOLOGÍA", "QUÍMICA", "QUÍMICA", "MICROBIOLOGÍA",
> "MICROBIOLOGÍA", "QUÍMICA", "MICROBIOLOGÍA", "QUÍMICA"],
> >
> >
> > Regards,
> >
> > Alexis Aravena S.
> >
> >
>


RE: FieldTypes and LowerCase

2019-03-14 Thread Moyer, Brett
Ok I think I'm getting it. At Index/Query time the analyzers fire and "do 
stuff". Ex: "the sheep jumped over the MOON" that could be Tokened on spaces, 
lowercased etc. and that is stored in the Inverted Index, something you 
probably can't really see.

In solr the string above is what you see in its original form. When you search 
for "sheep" that would come back because the Inverted Index has it stored in 
that form, separated words based on spaces, right? Further if I searched for 
moon (lowercase) it would be found because the analyzer is also storing in the 
Inverted Index the lowercase form, right?

I'm getting closer I think. Ok so if I want to physically lowercase the URL and 
store it that way, I need to do it before it gets to the Index as you stated. 
Ok got it, Thanks!

Brett Moyer
Manager, Sr. Technical Lead | TFS Technology
  Public Production Support
  Digital Search & Discovery

8625 Andrew Carnegie Blvd | 4th floor
Charlotte, NC 28263
Tel: 704.988.4508
Fax: 704.988.4907
bmo...@tiaa.org 


-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Thursday, March 14, 2019 10:57 AM
To: solr-user@lucene.apache.org
Subject: Re: FieldTypes and LowerCase

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.


On 3/14/2019 8:49 AM, Moyer, Brett wrote:
> Thanks Shawn, " Analysis only happens to indexed data" Being the case when 
> the data gets Indexed, then wouldn't the Analyzer kickoff and lowercase the 
> URL? The analyzer I have defined is not set for Index or Query, so as I 
> understand it will fire during both events. If that is the case I still don't 
> get why the Lowercase doesn't fire when the data is being indexed.

It does happen for both index and query.

It sounds like you are assuming that when index analysis happens, that
what you get back in search results will be affected by that analysis.

What you get back in search results is stored data -- that is never
affected by analysis.

What gets affected by analysis is indexed data -- the data that is
searched by queries.  Not the data that comes back in search results.

Thanks,
Shawn
*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*


Re: FieldTypes and LowerCase

2019-03-14 Thread Shawn Heisey

On 3/14/2019 8:49 AM, Moyer, Brett wrote:

Thanks Shawn, " Analysis only happens to indexed data" Being the case when the 
data gets Indexed, then wouldn't the Analyzer kickoff and lowercase the URL? The analyzer 
I have defined is not set for Index or Query, so as I understand it will fire during both 
events. If that is the case I still don't get why the Lowercase doesn't fire when the 
data is being indexed.


It does happen for both index and query.

It sounds like you are assuming that when index analysis happens, that 
what you get back in search results will be affected by that analysis.


What you get back in search results is stored data -- that is never 
affected by analysis.


What gets affected by analysis is indexed data -- the data that is 
searched by queries.  Not the data that comes back in search results.


Thanks,
Shawn


Re: Boolean Searches?

2019-03-14 Thread David Hastings
If you make your default operator "OR", or the q.op, and also use the
edismax query parser you can use the qf field to boost the title heavily
compared to the default field you are using, for example i use something
like this, which may be over kill:
title^100 description^50 topic^30 text
i also have the same in my pf value as well
but it works for me.

On Thu, Mar 14, 2019 at 10:50 AM Dave Beckstrom 
wrote:

> Hi Everyone,
>
> I'm building a SOLR search application and the customer wants the search to
> work like google search.
>
>
> They want the user to be able to enter boolean searches like:
>
> train OR dragon.
>
> which would find any matches that has the word "train" or the word "dragon"
> in the title.
>
> I know that the SOLR search would like this:
>
> title:train OR title:dragon
>
> I am trying to avoid having to parse through what the user enters and build
> out complex search strings.
>
> Is there any way that I can build a search against the "title" field where
> if the user enters something like:
>
> train OR dragon AND 2
>
> it will hour the boolean AND/OR logic without my having to convert it into
> somethng nasty like:
>
> title:train OR title:dragon AND title:2
>
>
> Thank you!
>
> --
> *Fig Leaf Software, Inc.*
> https://www.figleaf.com/
> 
>
> Full-Service Solutions Integrator
>
>
>
>
>
>
>


Re: Solr collection indexed to pdf in hdfs throws error during solr restart

2019-03-14 Thread Shawn Heisey

On 3/14/2019 1:13 AM, VAIBHAV SHUKLA shuklavaibha...@yahoo.in wrote:

When I restart Solr it throws the following error. Solr collection indexed to 
pdf in hdfs throws error during solr restart.

Error





Caused by: org.apache.lucene.store.LockObtainFailedException: Index dir 
'hdfs://192.168.1.16:8020/PDFIndex/data/index/' of core 'PDFIndex' is already 
locked. The most likely cause is another Solr server (or another solr core in 
this server) also configured to use this directory; other possible causes may 
be specific to lockType: hdfs


Solr has been shut down forcefully, so the lockfile is remaining in the 
core's directory (which in your case is in HDFS).  A graceful shutdown 
would have deleted the lockfile.


What version of Solr, and what OS do you have it running on?

For a while now, on non-windows operating systems, the "stop" action in 
the bin/solr script has waited up to 3 minutes for Solr to gracefully 
shut down before forcefully killing it.  This has eliminated most of 
these problems when running on one of those operating systems.


On Windows, the bin\solr script is only waiting 5 seconds before 
forcefully killing Solr, which used to happen on all operating systems. 
This is extremely likely to cause problems like this.  Fixing this on 
Windows is on the radar, but in general we lack adept skill with 
Windows, so it's not proceeding quickly.


I'm having trouble locating the issue for fixing the problem on Windows.

To fix it, find the "write.lock" file in your core's HDFS storage 
location and delete it.


Thanks,
Shawn


Boolean Searches?

2019-03-14 Thread Dave Beckstrom
Hi Everyone,

I'm building a SOLR search application and the customer wants the search to
work like google search.


They want the user to be able to enter boolean searches like:

train OR dragon.

which would find any matches that has the word "train" or the word "dragon"
in the title.

I know that the SOLR search would like this:

title:train OR title:dragon

I am trying to avoid having to parse through what the user enters and build
out complex search strings.

Is there any way that I can build a search against the "title" field where
if the user enters something like:

train OR dragon AND 2

it will hour the boolean AND/OR logic without my having to convert it into
somethng nasty like:

title:train OR title:dragon AND title:2


Thank you!

-- 
*Fig Leaf Software, Inc.* 
https://www.figleaf.com/ 
  

Full-Service Solutions Integrator








RE: FieldTypes and LowerCase

2019-03-14 Thread Moyer, Brett
Thanks Shawn, " Analysis only happens to indexed data" Being the case when the 
data gets Indexed, then wouldn't the Analyzer kickoff and lowercase the URL? 
The analyzer I have defined is not set for Index or Query, so as I understand 
it will fire during both events. If that is the case I still don't get why the 
Lowercase doesn't fire when the data is being indexed. 

Brett Moyer

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Thursday, March 14, 2019 10:44 AM
To: solr-user@lucene.apache.org
Subject: Re: FieldTypes and LowerCase

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.


On 3/14/2019 7:47 AM, Moyer, Brett wrote:
> I'm using the below FieldType/Field but when I index my documents, the URL is 
> not being lower case. Any ideas? Do I have the below wrong?
>
> Example: http://connect.rightprospectus.com/RSVP/TADF
> Expect: http://connect.rightprospectus.com/rsvp/tadf
>
>  omitNorms="true">
> 
>
>
> 
> 
>
>  stored="true"/>

Analysis only happens to indexed data.

The data that you get back from Solr (stored data) is *always* EXACTLY
what Solr indexes, before analysis.

You'll need to lowercase the data before it reaches analysis.  This is
how it is designed to work ... that will not be changing.

If you were to configure an Update Processor chain that did the
lowercasing, that would affect stored data as well as indexed data.

Thanks,
Shawn
*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*



Re: Commits and new document visibility

2019-03-14 Thread Shawn Heisey

On 3/14/2019 8:23 AM, Christopher Schultz wrote:

I believe that the only thing I want to do is to set the
autoSoftCommit value to something "reasonable". I'll probably start
with maybe 15000 (15sec) to match the hard-commit setting and see if
we get any complaints about delays between "save" and "seeing the user".


In my opinion, 15 seconds is far too frequent for opening a new 
searcher.  If the index reaches any real size, you may be in a situation 
where the full soft commit takes longer than 15 seconds to complete - 
mostly due to warming or autowarming.  Commits that open a searcher can 
be very resource-intensive ... if they happen too frequently, then heavy 
indexing will cause your Solr instance to never "calm down" ... it will 
always be hitting the CPU and disk hard.


I'd personally start with one minute and adjust from there based on how 
long the commits take.



In our case, we don't have a huge number of documents being created in
  a minute. Probably once per minute, if that.

Does that seem reasonable?

As for actually SETTING the setting, I'd prefer not to edit the
solrconfig.xml document. Instead, can I set this in my solr.in.sh
script? I see an example like this right in the file:

SOLR_OPTS="$SOLR_OPTS -Dsolr.autoSoftCommit.maxTime=3000"


3 seconds is even more problematic than 15.

I believe that when you use "bin/solr create" to create an index with 
the default config, that it does set the autoSoftCommit to 3 seconds. 
Which as I stated, I believe to be far too frequent.


Thanks,
Shawn


Re: FieldTypes and LowerCase

2019-03-14 Thread Shawn Heisey

On 3/14/2019 7:47 AM, Moyer, Brett wrote:

I'm using the below FieldType/Field but when I index my documents, the URL is 
not being lower case. Any ideas? Do I have the below wrong?

Example: http://connect.rightprospectus.com/RSVP/TADF
Expect: http://connect.rightprospectus.com/rsvp/tadf



   
   






Analysis only happens to indexed data.

The data that you get back from Solr (stored data) is *always* EXACTLY 
what Solr indexes, before analysis.


You'll need to lowercase the data before it reaches analysis.  This is 
how it is designed to work ... that will not be changing.


If you were to configure an Update Processor chain that did the 
lowercasing, that would affect stored data as well as indexed data.


Thanks,
Shawn


Re: [ANNOUNCE] Apache Solr 8.0.0 released

2019-03-14 Thread Toke Eskildsen
On Thu, 2019-03-14 at 13:16 +0100, jim ferenczi wrote:
> http://lucene.apache.org/solr/8_0_0/changes/Changes.html

Thank you for the hard work of rolling the release!
Looking forward to upgrading.

- Toke Eskildsen, Royal Danish Library




Re: Duplicate values in Multi Value Fields

2019-03-14 Thread Alexis Aravena Silva
I've tried with the following, but it doesn't work, it seems like solr doesn't 
take the configuration:





  _nombreArea_


  




  uniq-fields

  




From: MUNENDRA S.N 
Sent: Thursday, March 14, 2019 11:17:40 AM
To: solr-user@lucene.apache.org
Subject: Re: Duplicate values in Multi Value Fields

Probably you could add-distinct operation for unique values in multivalued
fields

https://lucene.apache.org/solr/guide/7_3/updating-parts-of-documents.html

On Thu, Mar 14, 2019, 7:40 PM Jörn Franke  wrote:

> With an update request processor
>
> https://lucene.apache.org/solr/7_4_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html
>
> > Am 14.03.2019 um 15:01 schrieb Alexis Aravena Silva <
> aarav...@itsofteg.com>:
> >
> > Hello,
> >
> >
> > I'm indexing data into some MultiValueFields, but  I have duplicates,
> how can I remove the duplicate values at indexing time?
> >
> >
> > I'm using Solr 7.
> >
> >
> > sample:
> >
> >
> > _nombreArea_":["MICROBIOLOGÍA", "QUÍMICA", "MICROBIOLOGÍA",
> "MICROBIOLOGÍA", "MICROBIOLOGÍA", "QUÍMICA", "QUÍMICA", "MICROBIOLOGÍA",
> "MICROBIOLOGÍA", "QUÍMICA", "MICROBIOLOGÍA", "QUÍMICA"],
> >
> >
> > Regards,
> >
> > Alexis Aravena S.
> >
> >
>


Commits and new document visibility

2019-03-14 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

All,

I recently had a situation where a document wasn't findable in a
fairly small Solr core/collection and I didn't see any errors in
either the application using Solr or within Solr itself. A Solr
service restart caused the document to become visible.

So I started reading.

I believe the "problem" is that the document was indexed but not
visible due to the default commit settings in Solr 7.5 -- which is the
version  I happen to be running right now.

I never bothered so change anything from the defaults because, well, I
didn't know what I was doing. Now that I (a) have a problem to solve
and (b) know a little more about what is happening, I just wanted a
quick sanity-check on what I'd like to do.

[Quick background: my core/collection stores user data so that other
users can quickly find anyone in the system via text-search. This
replaced our previous RDBMS-based "SELECT ... WHERE name LIKE
'%whatever%'" implementation which of course wasn't scaling well.
Generally, users will expect that when a new user is created, they
will be findable "fairly soon" (probably immediately) afterwards.]

We are using SolrJ as a client from our application, btw.

Initially, we were doing:

SolrInputDocument document = ...;
SolrClient solr = ...;
solr.add(document);
solr.commit();

Someone told me that committing after every document-add was wasteful
and it seemed like good advice -- allow Solr's autoCommit mechanism to
handle the commits and we'll get better performance. The problem was
that no new documents are visible unless we take additional action.

So, here's the default settings:

autoCommit   = max 15sec
openSearcher = false

autoSoftCommit = never[*]

This means that every 15 seconds (plus OS/disk sync time), I'll get a
safe snapshot of the data. I'm okay with losing 15 seconds worth of
data if there is some catastrophe.

It also means that my documents are pretty much never made visible.

I believe that the only thing I want to do is to set the
autoSoftCommit value to something "reasonable". I'll probably start
with maybe 15000 (15sec) to match the hard-commit setting and see if
we get any complaints about delays between "save" and "seeing the user".

In our case, we don't have a huge number of documents being created in
 a minute. Probably once per minute, if that.

Does that seem reasonable?

As for actually SETTING the setting, I'd prefer not to edit the
solrconfig.xml document. Instead, can I set this in my solr.in.sh
script? I see an example like this right in the file:

SOLR_OPTS="$SOLR_OPTS -Dsolr.autoSoftCommit.maxTime=3000"

Is that a fairly standard way to set the autoSoftCommit value for all
cores?

Thanks,
- -chris

[*] This setting is documented only in a single place: in the
"near-real-time" documentation. It would be nice if that special value
was called-out in other places so it wasn't so hard to find.
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlyKY9wACgkQHPApP6U8
pFhxzRAAnxLCMPSFwJxChXZ8q7UJ9hHAGyMPHNs3k0tFilt9/aT+eR7rUEFGupvR
anl+o7QNU8fOreF/l0KoFeGpjNLHZqEJRSKrZkaEb0PH3gabH5IKpgwY9hr+CS9N
bcKC7GwQAs19TdkTorxY+MIBeQo0/bO51Ux7XallzYPdX6BW/+kRGlHCuiAQj3fg
+EwQan0iXLslk/bDxvCvg95B1zlvr7R4iRAOwp9GxIsk4tL8X/B7sOS5pm0RK19/
tiVJuAqTBwD2fQ3lZ1oQftadKMuajgedJdrrgd94jCuwzWVLjJpIXql2AKA/QcsM
7e2zJqOsPy/4eGFUJ+St5/JYxFfm/yzFjV4rTW1/wng65mmbYAGpLsQ3A+05A8s1
o8ciDQ/80/fvnislr3/NGxZF5hSMjJG4xVriDWpdHX+PqfbqfpeaWnR4j8HEP3vy
tPklo3MflnPLk0oA6wqvjSX32ujucVd+X5tKKtkqnE6rorD41FpJGVRvgUrq7Zof
kwNro/r7ObqD72hioJJIkjol3ImL3NGSyeZ6XZtsKx+kEsGoyvW5lsRtC580ksXN
tYaJbCWQbrHmXnf3ooQV0PatQi0YkG70BQceKPXNQJ3l8Fmc2MjrP7aJ9//ptrMl
Pvc0qh4mpzGJKMBjSjaItadmouZdc3dn308xP4WIvpt2a4RYmjo=
=PrAt
-END PGP SIGNATURE-


Re: Duplicate values in Multi Value Fields

2019-03-14 Thread MUNENDRA S.N
Probably you could add-distinct operation for unique values in multivalued
fields

https://lucene.apache.org/solr/guide/7_3/updating-parts-of-documents.html

On Thu, Mar 14, 2019, 7:40 PM Jörn Franke  wrote:

> With an update request processor
>
> https://lucene.apache.org/solr/7_4_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html
>
> > Am 14.03.2019 um 15:01 schrieb Alexis Aravena Silva <
> aarav...@itsofteg.com>:
> >
> > Hello,
> >
> >
> > I'm indexing data into some MultiValueFields, but  I have duplicates,
> how can I remove the duplicate values at indexing time?
> >
> >
> > I'm using Solr 7.
> >
> >
> > sample:
> >
> >
> > _nombreArea_":["MICROBIOLOGÍA", "QUÍMICA", "MICROBIOLOGÍA",
> "MICROBIOLOGÍA", "MICROBIOLOGÍA", "QUÍMICA", "QUÍMICA", "MICROBIOLOGÍA",
> "MICROBIOLOGÍA", "QUÍMICA", "MICROBIOLOGÍA", "QUÍMICA"],
> >
> >
> > Regards,
> >
> > Alexis Aravena S.
> >
> >
>


Re: Duplicate values in Multi Value Fields

2019-03-14 Thread Jörn Franke
With an update request processor
https://lucene.apache.org/solr/7_4_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html

> Am 14.03.2019 um 15:01 schrieb Alexis Aravena Silva :
> 
> Hello,
> 
> 
> I'm indexing data into some MultiValueFields, but  I have duplicates, how can 
> I remove the duplicate values at indexing time?
> 
> 
> I'm using Solr 7.
> 
> 
> sample:
> 
> 
> _nombreArea_":["MICROBIOLOGÍA", "QUÍMICA", "MICROBIOLOGÍA", "MICROBIOLOGÍA", 
> "MICROBIOLOGÍA", "QUÍMICA", "QUÍMICA", "MICROBIOLOGÍA", "MICROBIOLOGÍA", 
> "QUÍMICA", "MICROBIOLOGÍA", "QUÍMICA"],
> 
> 
> Regards,
> 
> Alexis Aravena S.
> 
> 


Duplicate values in Multi Value Fields

2019-03-14 Thread Alexis Aravena Silva
Hello,


I'm indexing data into some MultiValueFields, but  I have duplicates, how can I 
remove the duplicate values at indexing time?


I'm using Solr 7.


sample:


_nombreArea_":["MICROBIOLOGÍA", "QUÍMICA", "MICROBIOLOGÍA", "MICROBIOLOGÍA", 
"MICROBIOLOGÍA", "QUÍMICA", "QUÍMICA", "MICROBIOLOGÍA", "MICROBIOLOGÍA", 
"QUÍMICA", "MICROBIOLOGÍA", "QUÍMICA"],


Regards,

Alexis Aravena S.




FieldTypes and LowerCase

2019-03-14 Thread Moyer, Brett
I'm using the below FieldType/Field but when I index my documents, the URL is 
not being lower case. Any ideas? Do I have the below wrong?

Example: http://connect.rightprospectus.com/RSVP/TADF
Expect: http://connect.rightprospectus.com/rsvp/tadf



  
  





Brett Moyer


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*


Re: NPE deleting expired docs (SOLR-13281)

2019-03-14 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Thank you for sharing that 7.6 has the same issue.

If anyone is interested in delving into the code to investigate further, I've 
added short steps on https://issues.apache.org/jira/browse/SOLR-13281 as to how 
one could potentially make a start on that.

From: solr-user@lucene.apache.org At: 03/13/19 08:45:12To:  
solr-user@lucene.apache.org
Subject: Re: NPE deleting expired docs (SOLR-13281)

We have the same issue on Solr 7.6.

On 12.03.2019 16:05, Gerald Bonfiglio wrote:
> Has anyone else observed NPEs attempting to have expired docs removed?  I'm 
seeing the following exceptions:
>
> 2019-02-28 04:06:34.849 ERROR (autoExpireDocs-30-thread-1) [ ] 
o.a.s.u.p.DocExpirationUpdateProcessorFactory Runtime error in periodic 
deletion of expired docs: null
> java.lang.NullPointerException: null
> at 
org.apache.solr.update.processor.DistributedUpdateProcessor.handleReplicationFac
tor(DistributedUpdateProcessor.java:992) ~[solr-core-7.7.0.jar:7.7.0 
8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:23:46]
> at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(Distributed
UpdateProcessor.java:960) ~[solr-core-7.7.0.jar:7.7.0 
8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:23:46]
>
> Seems all that's required to reproduce it is to include 
DocExpirationUpdateProcessorFactory in an updateRequestProcessorChain.
>
> More details can be found at: 
https://issues.apache.org/jira/projects/SOLR/issues/SOLR-13281
>
>
>
>
>
> [Nastel  Technologies]
>
> The information contained in this e-mail and in any attachment is 
confidential and
> is intended solely for the use of the individual or entity to which it is 
addressed.
> Access, copying, disclosure or use of such information by anyone else is 
unauthorized.
> If you are not the intended recipient, please delete the e-mail and refrain 
from use of such information.
>




[ANNOUNCE] Apache Solr 8.0.0 released

2019-03-14 Thread jim ferenczi
14 March 2019, Apache Solr™ 8.0.0 available

The Lucene PMC is pleased to announce the release of Apache Solr 8.0.0

Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search and analytics, rich document
parsing, geospatial search, extensive REST APIs as well as parallel SQL.
Solr is enterprise grade, secure and highly scalable, providing fault
tolerant distributed search and indexing, and powers the search and
navigation features of many of the world's largest internet sites.

The release is available for immediate download at:

http://www.apache.org/dyn/closer.lua/lucene/solr/8.0.0

Please read CHANGES.txt for a detailed list of changes:

https://lucene.apache.org/solr/8_0_0/changes/Changes.html

Solr 8.0.0 Release Highlights
* Solr now uses HTTP/2 for inter-node communication

Being a major release, Solr 8 removes many deprecated APIs, changes various
parameter defaults and behavior. Some changes may require a re-index of
your content. You are thus encouraged to thoroughly read the "Upgrade
Notes" at http://lucene.apache.org/solr/8_0_0/changes/Changes.html or in
the CHANGES.txt file accompanying the release.

Solr 8.0 also includes many other new features as well as numerous
optimizations and bugfixes of the corresponding Apache Lucene release.

Please report any feedback to the mailing lists (
http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror. This also goes for Maven access.


Re: Solr collection indexed to pdf in hdfs throws error during solr restart

2019-03-14 Thread Jason Gerlowski
> When I restart Solr

How exactly are you restarting Solr?  Are you running a "bin/solr
restart"?  Or is Solr already shut down and you're just starting it
back up with a "bin/solr start "?  Depending on how Solr
was shut down, you might be running into a bit of a known-issue with
Solr's HDFS support.  Solr creates lock files for each index, to
restrict who can write to that index in the interest of avoiding race
conditions and protecting against file corruption.  Often when Solr
crashes or is shut down abruptly (via a "kill -9") it doesn't have
time to clean up these lock files and it fails to start up the next
time because it is still locked out from touching that index.  This
might be what you're running in to.  In which case you could carefully
make sure that no Solr nodes are using the index in question, delete
the lock file manually out of HDFS, and try starting Solr again.

The advice above is what we usually tell people with write.lock issues
on HDFS...though some elements of the stack trace you provided make me
wonder whether you're seeing the same exact problem.  Your stack trace
has a NullPointerException, and a "Filesystem Closed" error (typically
seen when a Java object gets closed too early and may indicate a bug).
I'm not used to seeing either of these associated with the "standard"
write.lock issues.  What version of Solr are you seeing this on?

Best regards,

Jason

On Thu, Mar 14, 2019 at 5:28 AM VAIBHAV SHUKLA
shuklavaibha...@yahoo.in  wrote:
>
> When I restart Solr it throws the following error. Solr collection indexed to 
> pdf in hdfs throws error during solr restart.
>
>
>
> Error
>
> java.util.concurrent.ExecutionException: 
> org.apache.solr.common.SolrException: Unable to create core [PDFIndex]
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.solr.core.CoreContainer.lambda$load$6(CoreContainer.java:594)
> at 
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.solr.common.SolrException: Unable to create core 
> [PDFIndex]
> at 
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:966)
> at 
> org.apache.solr.core.CoreContainer.lambda$load$5(CoreContainer.java:565)
> at 
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
> ... 5 more
> Caused by: org.apache.solr.common.SolrException: Index dir 
> 'hdfs://192.168.1.16:8020/PDFIndex/data/index/' of core 'PDFIndex' is already 
> locked. The most likely cause is another Solr server (or another solr core in 
> this server) also configured to use this directory; other possible causes may 
> be specific to lockType: hdfs
> at org.apache.solr.core.SolrCore.(SolrCore.java:977)
> at org.apache.solr.core.SolrCore.(SolrCore.java:830)
> at 
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:950)
> ... 7 more
> Caused by: org.apache.lucene.store.LockObtainFailedException: Index dir 
> 'hdfs://192.168.1.16:8020/PDFIndex/data/index/' of core 'PDFIndex' is already 
> locked. The most likely cause is another Solr server (or another solr core in 
> this server) also configured to use this directory; other possible causes may 
> be specific to lockType: hdfs
> at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:712)
> at org.apache.solr.core.SolrCore.(SolrCore.java:923)
> ... 9 more
> 2018-12-22 07:55:13.431 ERROR 
> (OldIndexDirectoryCleanupThreadForCore-PDFIndex) [   x:PDFIndex] 
> o.a.s.c.HdfsDirectoryFactory Error checking for old index directories to 
> clean-up.
> java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2083)
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2069)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:791)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849)
> at 
> 

Solr collection indexed to pdf in hdfs throws error during solr restart

2019-03-14 Thread VAIBHAV SHUKLA shuklavaibha...@yahoo.in
When I restart Solr it throws the following error. Solr collection indexed to 
pdf in hdfs throws error during solr restart.



Error

java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: 
Unable to create core [PDFIndex]
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.solr.core.CoreContainer.lambda$load$6(CoreContainer.java:594)
at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.common.SolrException: Unable to create core 
[PDFIndex]
at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:966)
at org.apache.solr.core.CoreContainer.lambda$load$5(CoreContainer.java:565)
at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
... 5 more
Caused by: org.apache.solr.common.SolrException: Index dir 
'hdfs://192.168.1.16:8020/PDFIndex/data/index/' of core 'PDFIndex' is already 
locked. The most likely cause is another Solr server (or another solr core in 
this server) also configured to use this directory; other possible causes may 
be specific to lockType: hdfs
at org.apache.solr.core.SolrCore.(SolrCore.java:977)
at org.apache.solr.core.SolrCore.(SolrCore.java:830)
at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:950)
... 7 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Index dir 
'hdfs://192.168.1.16:8020/PDFIndex/data/index/' of core 'PDFIndex' is already 
locked. The most likely cause is another Solr server (or another solr core in 
this server) also configured to use this directory; other possible causes may 
be specific to lockType: hdfs
at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:712)
at org.apache.solr.core.SolrCore.(SolrCore.java:923)
... 9 more
2018-12-22 07:55:13.431 ERROR (OldIndexDirectoryCleanupThreadForCore-PDFIndex) 
[   x:PDFIndex] o.a.s.c.HdfsDirectoryFactory Error checking for old index 
directories to clean-up.
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2083)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2069)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:791)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:860)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557)
at 
org.apache.solr.core.HdfsDirectoryFactory.cleanupOldIndexDirectories(HdfsDirectoryFactory.java:546)
at 
org.apache.solr.core.SolrCore.lambda$cleanupOldIndexDirectories$19(SolrCore.java:3050)
at java.lang.Thread.run(Thread.java:748)
2018-12-22 07:55:13.433 ERROR (OldIndexDirectoryCleanupThreadForCore-PDFIndex) 
[   x:PDFIndex] o.a.s.c.SolrCore Failed to cleanup old index directories for 
core PDFIndex
java.lang.NullPointerException
at 
org.apache.solr.core.HdfsDirectoryFactory.cleanupOldIndexDirectories(HdfsDirectoryFactory.java:564)
at 
org.apache.solr.core.SolrCore.lambda$cleanupOldIndexDirectories$19(SolrCore.java:3050)
at java.lang.Thread.run(Thread.java:748)

I have created the collection in Solr which will index the pdf files and this 
collection is indexing all the pdf in HDFS.

Thanks & Regards
Vaibhav Shukla
Sent from Mail for Windows 10



Solr collection indexed to pdf in hdfs throws error during solr restart

2019-03-14 Thread VAIBHAV SHUKLA shuklavaibha...@yahoo.in
When I restart Solr it throws the following error. Solr collection indexed to 
pdf in hdfs throws error during solr restart.



Error

java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: 
Unable to create core [PDFIndex]
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.solr.core.CoreContainer.lambda$load$6(CoreContainer.java:594)
at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.common.SolrException: Unable to create core 
[PDFIndex]
at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:966)
at org.apache.solr.core.CoreContainer.lambda$load$5(CoreContainer.java:565)
at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
... 5 more
Caused by: org.apache.solr.common.SolrException: Index dir 
'hdfs://192.168.1.16:8020/PDFIndex/data/index/' of core 'PDFIndex' is already 
locked. The most likely cause is another Solr server (or another solr core in 
this server) also configured to use this directory; other possible causes may 
be specific to lockType: hdfs
at org.apache.solr.core.SolrCore.(SolrCore.java:977)
at org.apache.solr.core.SolrCore.(SolrCore.java:830)
at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:950)
... 7 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Index dir 
'hdfs://192.168.1.16:8020/PDFIndex/data/index/' of core 'PDFIndex' is already 
locked. The most likely cause is another Solr server (or another solr core in 
this server) also configured to use this directory; other possible causes may 
be specific to lockType: hdfs
at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:712)
at org.apache.solr.core.SolrCore.(SolrCore.java:923)
... 9 more
2018-12-22 07:55:13.431 ERROR (OldIndexDirectoryCleanupThreadForCore-PDFIndex) 
[   x:PDFIndex] o.a.s.c.HdfsDirectoryFactory Error checking for old index 
directories to clean-up.
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2083)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2069)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:791)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:860)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557)
at 
org.apache.solr.core.HdfsDirectoryFactory.cleanupOldIndexDirectories(HdfsDirectoryFactory.java:546)
at 
org.apache.solr.core.SolrCore.lambda$cleanupOldIndexDirectories$19(SolrCore.java:3050)
at java.lang.Thread.run(Thread.java:748)
2018-12-22 07:55:13.433 ERROR (OldIndexDirectoryCleanupThreadForCore-PDFIndex) 
[   x:PDFIndex] o.a.s.c.SolrCore Failed to cleanup old index directories for 
core PDFIndex
java.lang.NullPointerException
at 
org.apache.solr.core.HdfsDirectoryFactory.cleanupOldIndexDirectories(HdfsDirectoryFactory.java:564)
at 
org.apache.solr.core.SolrCore.lambda$cleanupOldIndexDirectories$19(SolrCore.java:3050)
at java.lang.Thread.run(Thread.java:748)

I have created the collection in Solr which will index the pdf files and this 
collection is indexing all the pdf in HDFS.

Thanks & Regards
Vaibhav Shukla
Sent from Mail for Windows 10



Re: solr search Ontology based data set

2019-03-14 Thread Charlie Hull

On 13/03/2019 17:01, Jie Luo wrote:

Hi all,

I have several ontology based data sets, I would like to use solr as search 
engine. Solr document is flat document. I would like to know how it is the best 
way to handle the search.

Simple search is fine. One possible search I will need to retrieve the ontology 
tree or graph

Best regards

Jie


Are you aware of the BioSolr project? Have a chat to Sameer Velankar at 
EBI. There's some background here


https://github.com/flaxsearch/BioSolr
https://www.ebi.ac.uk/spot/BioSolr/

Various ontology indexing code for Solr was developed as part of this 
project.


Best

Charlie


--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk