suggestors on shingles

2017-07-12 Thread govind nitk
Hi,

I have a fieldtype "suggestion" with definition as :


**
*  *
**
**
* *
*  *
*  *
**
**

*  *
**

I have a field named "mysuggestion" with definition as :

**

*I copy other field - names, countries, short description to
"mysuggestion".*


I am building a suggestor on top of this field as:

**
*  fuzzySuggester*
*  FuzzyLookupFactory*
*  DocumentDictionaryFactory*
*  true*
*  true*
*  mysuggestion*
*  true*
*  suggestion *
**



Expectation: returned Syggestions should be shingles not the entire line of
description or name.

1. Is it possible to pass suggestors a tokenized/analyzed field ?
2. Is it possible to retrieve tokenized values from solr ?


Regards,
Govind


Re: Is JSON facet output removing characters like \t from output

2017-07-12 Thread Erick Erickson
Then you'll have to scrub the data on the way in.

Or change the type to something like KeywordTokenizer and use
PatternReplaceCharFilter(Factory) to get  rid of unwanted stuff.

Best,
Erick

On Wed, Jul 12, 2017 at 7:07 PM, Zheng Lin Edwin Yeo
 wrote:
> The field which I am bucketing is indexed using String field, and does not
> pass through any tokenizers.
>
> Regards,
> Edwin
>
> On 12 July 2017 at 21:52, Susheel Kumar  wrote:
>
>> I checked on 6.6 and don't see any such issues. I assume the field you are
>> bucketing on is string/keywordtokenizer not text/analyzed field.
>>
>>
>> ===
>>
>> "facets":{
>>
>> "count":5,
>>
>> "myfacet":{
>>
>>   "buckets":[{
>>
>>   "val":"A\t\t\t",
>>
>>   "count":2},
>>
>> {
>>
>>   "val":"L\t\t\t",
>>
>>   "count":1},
>>
>> {
>>
>>   "val":"P\t\t\t",
>>
>>   "count":1},
>>
>> {
>>
>>   "val":"Z\t\t\t",
>>
>>   "count":1}]}}}
>>
>> On Wed, Jul 12, 2017 at 2:31 AM, Zheng Lin Edwin Yeo > >
>> wrote:
>>
>> > Hi,
>> >
>> > Would like to check, does JSON facet output remove characters like \t
>> from
>> > its output?
>> >
>> > Currently, we found that if the result is not in the last result set, the
>> > characters like \t will be removed from the output. However, if it is the
>> > last result set, the \t will not be removed.
>> >
>> > As there is discrepancy in the results being returned, is this
>> considered a
>> > bug in the output of the JSON facet?
>> >
>> > I'm using Solr 6.5.1.
>> >
>> > Snapshot of output when \t is not removed:
>> >
>> >   "description":{
>> > "buckets":[{
>> >"val":"detaildescription\t\t\t\t",
>> > "count":1}]},
>> >
>> > Snapshot of output when \t is removed:
>> >
>> >   "description":{
>> > "buckets":[{
>> >"val":"detaildescription",
>> > "count":1}]},
>> >
>> > Regards,
>> > Edwin
>> >
>>


Re: Placing different collections on different hard disk/folder

2017-07-12 Thread Shawn Heisey
On 7/12/2017 8:25 PM, Zheng Lin Edwin Yeo wrote:
> Thanks Shawn and Erick.
>
> We are planning to migrate the data in two of the largest collections to
> another hard disk, while the rest of the collections remains at the default
> one for the main core.
> So there are already data indexed in the collections.
> Will this method work, or we have to create a new collection, edit the
> core.properties files, then transfer the index files to this new collection?

Yes.

You can use the method Erick mentioned, or you can use the alternate
method outlined in the paragraph below.  For Erick's idea, you use the
Collections API to add a new replica with a new dataDir property, and
let Solr copy the index data for you, then use DELETEREPLICA to get rid
of the one in the wrong location.  No downtime for any Solr instances,
and you don't have to stop indexing.

For the alternate method, you'll need to stop Solr, edit
core.properties, and copy the contents of the corename/data directory to
the location pointed at by the added dataDir property in the
core.properties file.  Then when you restart Solr, it will use the
existing data in the new location.  You can check the core overview in
the admin UI to make sure that the dataDir has been updated before you
delete the old data directory.

Thanks,
Shawn



Re: CDCR - how to deal with the transaction log files

2017-07-12 Thread Xie, Sean
Try run second data import or any other indexing jobs after the replication of 
the first data import is completed.

My observation is during the replication period (when there is docs in queue), 
tlog clean up will not triggered. So when queue is 0, and submit second batch 
and monitor the queue and tlogs again.

-- Thank you
Sean

From: jmyatt >
Date: Wednesday, Jul 12, 2017, 6:58 PM
To: solr-user@lucene.apache.org 
>
Subject: [EXTERNAL] Re: CDCR - how to deal with the transaction log files

glad to hear you found your solution!  I have been combing over this post and
others on this discussion board many times and have tried so many tweaks to
configuration, order of steps, etc, all with absolutely no success in
getting the Source cluster tlogs to delete.  So incredibly frustrating.  If
anyone has other pearls of wisdom I'd love some advice.  Quick hits on what
I've tried:

- solrconfig exactly like Sean's (target and source respectively) expect no
autoSoftCommit
- I am also calling cdcr?action=DISABLEBUFFER (on source as well as on
target) explicitly before starting since the config setting of
defaultState=disabled doesn't seem to work
- when I create the collection on source first, I get the warning "The log
reader for target collection {collection name} is not initialised".  When I
reverse the order (create the collection on target first), no such warning
- tlogs replicate as expected, hard commits on both target and source cause
tlogs to rollover, etc - all of that works as expected
- action=QUEUES on source reflects the queueSize accurately.  Also *always*
shows updateLogSynchronizer state as "stopped"
- action=LASTPROCESSEDVERSION on both source and target always seems correct
(I don't see the -1 that Sean mentioned).
- I'm creating new collections every time and running full data imports that
take 5-10 minutes. Again, all data replication, log rollover, and autocommit
activity seems to work as expected, and logs on target are deleted.  It's
just those pesky source tlogs I can't get to delete.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/CDCR-how-to-deal-with-the-transaction-log-files-tp4345062p4345715.html
Sent from the Solr - User mailing list archive at Nabble.com.

Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.


Re: Placing different collections on different hard disk/folder

2017-07-12 Thread Zheng Lin Edwin Yeo
Thanks Shawn and Erick.

We are planning to migrate the data in two of the largest collections to
another hard disk, while the rest of the collections remains at the default
one for the main core.
So there are already data indexed in the collections.
Will this method work, or we have to create a new collection, edit the
core.properties files, then transfer the index files to this new collection?

Regards,
Edwin


On 12 July 2017 at 23:07, Erick Erickson  wrote:

> Shawn's way will work, of course you have to be sure you didn't index
> any data before editing all the core.properties files.
>
> There's another way to set the dataDir per core though that has the
> advantage of not entailing any down time or hand editing files:
> > create the collection with createNodeSet=EMPTY. No replicas are created
> > Now add each replica with ADDREPLICA. In addition to all the regular
> params, use property.dataDir=place_you_want_the_index
>
> The ADDREPLICA property.blah param is intended exactly to set
> arbitrary properties in core.properties for the replica...
>
> Best,
> Erick
>
> On Wed, Jul 12, 2017 at 6:24 AM, Shawn Heisey  wrote:
> > On 7/12/2017 12:38 AM, Zheng Lin Edwin Yeo wrote:
> >> I found that we can set the path under  in solrconfig.xml
> >>
> >> However, this seems to work only if there is one replica. How do we set
> it
> >> if we have 2 or more replica?
> >
> > Setting dataDir in solrconfig.xml is something that really only works in
> > standalone Solr.  For SolrCloud, this method has issues that are
> > difficult to get around.
> >
> > Another option that works in ANY Solr mode is changing dataDir in the
> > core.properties file that every core uses.  Create the collection,
> > allowing Solr to create the cores in the default way.  Shut down Solr
> > and edit the core.properties file for each core that you want to have
> > the data in a different location.  Add a dataDir property pointing at
> > the new location for that core's data.  If the core actually has any
> > contents, you can move the data to that location, but if not, you can
> > simply let Solr create the data itself when it starts back up.
> >
> > The core.properties file is in Java properties format, which is well
> > documented in multiple places around the Internet.
> >
> > https://www.google.com/search?q=java+properties+format=utf-8=utf-8
> >
> > If the dataDir location is not an absolute path, then it will be
> > relative to the instanceDir -- the place where core.properties is.  The
> > dataDir value defaults to a simple value of "data".
> >
> > Thanks,
> > Shawn
> >
>


Re: Is JSON facet output removing characters like \t from output

2017-07-12 Thread Zheng Lin Edwin Yeo
The field which I am bucketing is indexed using String field, and does not
pass through any tokenizers.

Regards,
Edwin

On 12 July 2017 at 21:52, Susheel Kumar  wrote:

> I checked on 6.6 and don't see any such issues. I assume the field you are
> bucketing on is string/keywordtokenizer not text/analyzed field.
>
>
> ===
>
> "facets":{
>
> "count":5,
>
> "myfacet":{
>
>   "buckets":[{
>
>   "val":"A\t\t\t",
>
>   "count":2},
>
> {
>
>   "val":"L\t\t\t",
>
>   "count":1},
>
> {
>
>   "val":"P\t\t\t",
>
>   "count":1},
>
> {
>
>   "val":"Z\t\t\t",
>
>   "count":1}]}}}
>
> On Wed, Jul 12, 2017 at 2:31 AM, Zheng Lin Edwin Yeo  >
> wrote:
>
> > Hi,
> >
> > Would like to check, does JSON facet output remove characters like \t
> from
> > its output?
> >
> > Currently, we found that if the result is not in the last result set, the
> > characters like \t will be removed from the output. However, if it is the
> > last result set, the \t will not be removed.
> >
> > As there is discrepancy in the results being returned, is this
> considered a
> > bug in the output of the JSON facet?
> >
> > I'm using Solr 6.5.1.
> >
> > Snapshot of output when \t is not removed:
> >
> >   "description":{
> > "buckets":[{
> >"val":"detaildescription\t\t\t\t",
> > "count":1}]},
> >
> > Snapshot of output when \t is removed:
> >
> >   "description":{
> > "buckets":[{
> >"val":"detaildescription",
> > "count":1}]},
> >
> > Regards,
> > Edwin
> >
>


Re: CDCR - how to deal with the transaction log files

2017-07-12 Thread jmyatt
glad to hear you found your solution!  I have been combing over this post and
others on this discussion board many times and have tried so many tweaks to
configuration, order of steps, etc, all with absolutely no success in
getting the Source cluster tlogs to delete.  So incredibly frustrating.  If
anyone has other pearls of wisdom I'd love some advice.  Quick hits on what
I've tried:

- solrconfig exactly like Sean's (target and source respectively) expect no
autoSoftCommit
- I am also calling cdcr?action=DISABLEBUFFER (on source as well as on
target) explicitly before starting since the config setting of
defaultState=disabled doesn't seem to work
- when I create the collection on source first, I get the warning "The log
reader for target collection {collection name} is not initialised".  When I
reverse the order (create the collection on target first), no such warning
- tlogs replicate as expected, hard commits on both target and source cause
tlogs to rollover, etc - all of that works as expected
- action=QUEUES on source reflects the queueSize accurately.  Also *always*
shows updateLogSynchronizer state as "stopped"
- action=LASTPROCESSEDVERSION on both source and target always seems correct
(I don't see the -1 that Sean mentioned).
- I'm creating new collections every time and running full data imports that
take 5-10 minutes. Again, all data replication, log rollover, and autocommit
activity seems to work as expected, and logs on target are deleted.  It's
just those pesky source tlogs I can't get to delete.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/CDCR-how-to-deal-with-the-transaction-log-files-tp4345062p4345715.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Cluster without sharding

2017-07-12 Thread Erick Erickson
1> I would not do this. First there's the lock issues you mentioned.
But let's say replica1 is your indexer and replicas2 and 3 point to
the same index. When replica1 commits, now do replicas 2 and 3 know to
open a new searcher?

<2> and <3> just seem like variants of coupling Solr instances to
collections, which I'd advise against.

I'd just have a single collection with 1 shard and that shard has as
many replicas as you need, spread across as many Solr instances as you
want. CloudSolrClient takes care of load balancing with an internal
software load balancer and is aware of ZooKeeper so it can "do the
right thing". Updates get sent to all replicas and indexed locally. Do
not try to share indexes

You get all the HA/DR of SolrCloud.

If that doesn't work, _then_ worry about more complex schemes.

Best,
Erick

On Wed, Jul 12, 2017 at 12:58 PM, Mikhail Ibraheem
 wrote:
> Hi,We are using some features like collapse and joins that force us not to 
> use sharding for now. Still i am checking for possibilities for load 
> balancing and high availability.1- I think about using many solr instances 
> run against the same shard file system. This way all instances will work with 
> the same data. I know there may be issues with synchronization and open 
> seachers. But my main concern with this, will we have some lock issues like 
> deadlock between instances?
> 2- Having some collections owned by each instance. For example if I have 9 
> collections and 3 solr instances, I will divide the collections so that 3 
> collections owned by each instance.3- Can I influence the order of the 
> solrCloud Client? I mean if I have 3 instances ins1, ins2 and ins3. Am I able 
> to ask the solrCloudClient to try ins1 first, then ins2 and finally try ins3?
> Any more suggestions is more  than appreciated.
> ThanksMikhail


accessing numfound value

2017-07-12 Thread Steve Pruitt
I'm having difficulty finding the value for numFound that is in the response.  
My context is a custom component in the last-components list for /select.

Where rb is the ResponseBuilder parameter for the process(..) method:

rb.getNumberDocumentsFound() is 0.
rb.totalHitCount is 0.

I don't understand those values, why they are 0.

rb.getResults().docList.size() returns only the row size from the query.  I 
need the total hits.

In the JavaDoc it states rb.rsp.getResponse() returns just an Object.  I found 
it to be an instance of BasicResultContext in my case.  But, nothing in its 
description hints at how to get at the total found.

Can someone help?

Thanks.

-S


Re: Auto commit Error - Solr Cloud 6.6.0 with HDFS

2017-07-12 Thread Joe Obernberger

Thank you Shawn - we have some WARN messages like :

DFSClient
Slow waitForAckedSeqno took 31066ms (threshold=3ms)

and

DFSClient
Slow ReadProcessor read fields took 30737ms (threshold=3ms); ack: 
seqno: 1 reply: SUCCESS reply: SUCCESS reply: SUCCESS 
downstreamAckTimeNanos: 30735948057, targets: 
[DatanodeInfoWithStorage[172.16.100.223:50010,DS-fc197ece-4c43-46a4-964e-9ecc8adcdcb1,DISK], 
DatanodeInfoWithStorage[172.16.100.228:50010,DS-fd9fe2ea-780b-490a-b9a7-ad54c994e307,DISK], 
DatanodeInfoWithStorage[172.16.20.17:50010,DS-63b38ca5-b674-4cdc-8102-f07a22709dc1,DISK]]


I was getting the log ready for you, but it was overwritten in the 
interim.  If it happens again, I'll get the log file ready.


-Joe


On 7/12/2017 9:25 AM, Shawn Heisey wrote:

On 7/12/2017 7:14 AM, Joe Obernberger wrote:

Started up a 6.6.0 solr cloud instance running on 45 machines
yesterday using HDFS (managed schema in zookeeper) and began
indexing.  This error occurred on several of the nodes:



Caused by: org.apache.solr.common.SolrException: openNewSearcher
called on closed core

There's the important part of the error.

For some reason, which is not immediately clear from the information
provided, the core is closed.  In that situation, Solr is not able to
open a new searcher, so this error happens.  Do you have any other WARN
or ERROR messages in solr.log before this error?  You might want to find
a way to share an entire logfile and provide a URL for accessing it.

Thanks,
Shawn


---
This email has been checked for viruses by AVG.
http://www.avg.com





Cluster without sharding

2017-07-12 Thread Mikhail Ibraheem
Hi,We are using some features like collapse and joins that force us not to use 
sharding for now. Still i am checking for possibilities for load balancing and 
high availability.1- I think about using many solr instances run against the 
same shard file system. This way all instances will work with the same data. I 
know there may be issues with synchronization and open seachers. But my main 
concern with this, will we have some lock issues like deadlock between 
instances?
2- Having some collections owned by each instance. For example if I have 9 
collections and 3 solr instances, I will divide the collections so that 3 
collections owned by each instance.3- Can I influence the order of the 
solrCloud Client? I mean if I have 3 instances ins1, ins2 and ins3. Am I able 
to ask the solrCloudClient to try ins1 first, then ins2 and finally try ins3?
Any more suggestions is more  than appreciated.
ThanksMikhail

Re: Enabling SSL

2017-07-12 Thread Nawab Zada Asad Iqbal
I guess your certificates are self generated?  In that case, this is a
browser nanny trying to protect you.
I also get same error in Firefox, however Chrome was little forgiving. It
showed me an option to choose my certificate (the client certificate), and
then bypassed the safety barrier.
I should add that even Chrome didn't show me that 'select certificate'
option on the first attempt, so I don't know what caused it to trigger.

Here is a relevant thread about Firefox:
https://bugzilla.mozilla.org/show_bug.cgi?id=1255049


Let me know how it worked for you, as I am still learning this myself.


Regards
Nawab



On Wed, Jul 12, 2017 at 9:05 AM, Miller, William K - Norman, OK -
Contractor  wrote:

> I am not using Zookeeper.  Is the urlScheme also used outside of Zookeeper?
>
>
>
>
> ~~~
> William Kevin Miller
>
> ECS Federal, Inc.
> USPS/MTSC
> (405) 573-2158
>
>
> -Original Message-
> From: esther.quan...@lucidworks.com [mailto:esther.quan...@lucidworks.com]
> Sent: Wednesday, July 12, 2017 10:58 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Enabling SSL
>
> Hi William,
>
> You should be able to navigate to https://local host:8983/solr (albeit
> with your host:port) to access the admin UI, provided you updated the
> urlScheme property in the Zookeeper cluster props.
>
> Did you complete that step?
>
> Esther
> Search Engineer
> Lucidworks
>
>
>
> > On Jul 12, 2017, at 08:20, Miller, William K - Norman, OK - Contractor <
> william.k.mil...@usps.gov.INVALID> wrote:
> >
> > I am trying to enable SSL and I have followed the instructions in the
> Solr 6.4 reference manual, but when I restart my Solr server and try to
> access the Solr Admin page I am getting:
> >
> > “This page isn’t working”;
> >  sent an invalid response; ERR_INVALID_HTTP_RESPONSE
> >
> > Does the Solr server need to be on a secure server in order to enable
> SSL.
> >
> >
> > Additional Info:
> > Running Solr 6.5.1 on Linux OS
> >
> >
> >
> >
> > ~~~
> > William Kevin Miller
> >
> > ECS Federal, Inc.
> > USPS/MTSC
> > (405) 573-2158
> >
>


Re: Solr Cloud 6.x - rollback best practice

2017-07-12 Thread Mike Drob
The two collection approach with aliasing is a good approach.

You can also use the backup and restore APIs -
https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html

Mike

On Wed, Jul 12, 2017 at 10:57 AM, Vincenzo D'Amore 
wrote:

> Hi,
>
> I'm moving to Solr Cloud 6.x and I see rollback cannot be supported when is
> in Cloud mode.
>
> In my scenario, there are basically two tasks (full indexing, partial
> indexing).
>
> Full indexing
> =
>
> This is the most important case, where I really need the possibility to
> rollback.
>
> The full reindex is basically done in 3 steps:
>
> 1. delete *:* all collection's documents
> 2. add all existing documents
> 3. commit
>
> If during the step 2 something go wrong (usually some problem with the
> source of data) I had to rollback.
>
> Partial reindexing
> =
>
> Unlike the the former, this case is executed in only 2 steps (no delete)
> and the number of documents indexed usually is small (or very small).
>
> Even in this case if the step 2 go wrong I had to rollback.
>
> Do you know if there is a common pattern, a best practice, something of
> useful to handle a rollback if something go wrong in these cases?
>
> My simplistic idea is to have two collections (active/passive), and switch
> from one to another only when all the steps are completed successfully.
>
> But, as you can understand, having two collections works well with full
> indexing, but how do I handle a partial reindexing if something goes wrong?
>
> So, I'll be grateful to whom would spend his/her time to give me a
> suggestion.
>
> Thanks in advance and best regards,
> Vincenzo
>
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>


RE: Enabling SSL

2017-07-12 Thread Miller, William K - Norman, OK - Contractor
I am not using Zookeeper.  Is the urlScheme also used outside of Zookeeper?




~~~
William Kevin Miller

ECS Federal, Inc.
USPS/MTSC
(405) 573-2158


-Original Message-
From: esther.quan...@lucidworks.com [mailto:esther.quan...@lucidworks.com] 
Sent: Wednesday, July 12, 2017 10:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Enabling SSL

Hi William,

You should be able to navigate to https://local host:8983/solr (albeit with 
your host:port) to access the admin UI, provided you updated the urlScheme 
property in the Zookeeper cluster props. 

Did you complete that step?

Esther
Search Engineer
Lucidworks 



> On Jul 12, 2017, at 08:20, Miller, William K - Norman, OK - Contractor 
>  wrote:
> 
> I am trying to enable SSL and I have followed the instructions in the Solr 
> 6.4 reference manual, but when I restart my Solr server and try to access the 
> Solr Admin page I am getting:
>  
> “This page isn’t working”;
>  sent an invalid response; ERR_INVALID_HTTP_RESPONSE
>  
> Does the Solr server need to be on a secure server in order to enable SSL.
>  
>  
> Additional Info:
> Running Solr 6.5.1 on Linux OS
>  
>  
>  
>  
> ~~~
> William Kevin Miller
> 
> ECS Federal, Inc.
> USPS/MTSC
> (405) 573-2158
>  


Re: Enabling SSL

2017-07-12 Thread esther . quansah
Hi William,

You should be able to navigate to https://local host:8983/solr (albeit with 
your host:port) to access the admin UI, provided you updated the urlScheme 
property in the Zookeeper cluster props. 

Did you complete that step?

Esther
Search Engineer
Lucidworks 



> On Jul 12, 2017, at 08:20, Miller, William K - Norman, OK - Contractor 
>  wrote:
> 
> I am trying to enable SSL and I have followed the instructions in the Solr 
> 6.4 reference manual, but when I restart my Solr server and try to access the 
> Solr Admin page I am getting:
>  
> “This page isn’t working”;
>  sent an invalid response;
> ERR_INVALID_HTTP_RESPONSE
>  
> Does the Solr server need to be on a secure server in order to enable SSL.
>  
>  
> Additional Info:
> Running Solr 6.5.1 on Linux OS
>  
>  
>  
>  
> ~~~
> William Kevin Miller
> 
> ECS Federal, Inc.
> USPS/MTSC
> (405) 573-2158
>  


Solr Cloud 6.x - rollback best practice

2017-07-12 Thread Vincenzo D'Amore
Hi,

I'm moving to Solr Cloud 6.x and I see rollback cannot be supported when is
in Cloud mode.

In my scenario, there are basically two tasks (full indexing, partial
indexing).

Full indexing
=

This is the most important case, where I really need the possibility to
rollback.

The full reindex is basically done in 3 steps:

1. delete *:* all collection's documents
2. add all existing documents
3. commit

If during the step 2 something go wrong (usually some problem with the
source of data) I had to rollback.

Partial reindexing
=

Unlike the the former, this case is executed in only 2 steps (no delete)
and the number of documents indexed usually is small (or very small).

Even in this case if the step 2 go wrong I had to rollback.

Do you know if there is a common pattern, a best practice, something of
useful to handle a rollback if something go wrong in these cases?

My simplistic idea is to have two collections (active/passive), and switch
from one to another only when all the steps are completed successfully.

But, as you can understand, having two collections works well with full
indexing, but how do I handle a partial reindexing if something goes wrong?

So, I'll be grateful to whom would spend his/her time to give me a
suggestion.

Thanks in advance and best regards,
Vincenzo



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Do I need to declare TermVectorComponent for best MoreLikeThis results?

2017-07-12 Thread Max Bridgewater
Hi,

The MLT documentation says that for best results, the fields should have
stored term vectors in schema.xml, with:



My question: should I also create the TermVectorComponent and declare it in
the search handler?

In other terms, do I have to do this in my solrconfig.xml for best results?




  
true
  
  
tvComponent
  



I am seeing continuously increasing MLT response times and I am wondering
if I am doing something wrong.

Thanks.
Max.


Enabling SSL

2017-07-12 Thread Miller, William K - Norman, OK - Contractor
I am trying to enable SSL and I have followed the instructions in the Solr 6.4 
reference manual, but when I restart my Solr server and try to access the Solr 
Admin page I am getting:

"This page isn't working";
 sent an invalid response;
ERR_INVALID_HTTP_RESPONSE

Does the Solr server need to be on a secure server in order to enable SSL.


Additional Info:
Running Solr 6.5.1 on Linux OS




~~~
William Kevin Miller
[ecsLogo]
ECS Federal, Inc.
USPS/MTSC
(405) 573-2158



Re: Unable to integrate OpenNLP with Solr

2017-07-12 Thread Erick Erickson
There's not much anybody can do unless you tell us what the problem
you're having is. What have you tried? Exactly _how_ does it not work?

Please read:
https://wiki.apache.org/solr/UsingMailingLists

Best,
Erick

On Wed, Jul 12, 2017 at 5:58 AM, Sweta Parekh  wrote:
> Hi All,
>
> We are using Solr 6.6 and trying to integrate OpenNLP but are unable to do 
> that using patch lucene-2899. Can someone help or provide instructions for 
> the same
>
>
> Regards,
> Sweta Parekh


Re: Unable to integrate OpenNLP with Solr

2017-07-12 Thread jpereira
Hi Sweta,

I recently adapted that patch to a Solr instance running version 6.4 . If my
memory does not fail me, I think the only changes I had to make were
updating the package imports for the last OpenNLP version (I am using
OpenNLP 1.8):



What problem are you struggling with, exactly?

Best,

João



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-integrate-OpenNLP-with-Solr-tp4345601p4345626.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Placing different collections on different hard disk/folder

2017-07-12 Thread Erick Erickson
Shawn's way will work, of course you have to be sure you didn't index
any data before editing all the core.properties files.

There's another way to set the dataDir per core though that has the
advantage of not entailing any down time or hand editing files:
> create the collection with createNodeSet=EMPTY. No replicas are created
> Now add each replica with ADDREPLICA. In addition to all the regular params, 
> use property.dataDir=place_you_want_the_index

The ADDREPLICA property.blah param is intended exactly to set
arbitrary properties in core.properties for the replica...

Best,
Erick

On Wed, Jul 12, 2017 at 6:24 AM, Shawn Heisey  wrote:
> On 7/12/2017 12:38 AM, Zheng Lin Edwin Yeo wrote:
>> I found that we can set the path under  in solrconfig.xml
>>
>> However, this seems to work only if there is one replica. How do we set it
>> if we have 2 or more replica?
>
> Setting dataDir in solrconfig.xml is something that really only works in
> standalone Solr.  For SolrCloud, this method has issues that are
> difficult to get around.
>
> Another option that works in ANY Solr mode is changing dataDir in the
> core.properties file that every core uses.  Create the collection,
> allowing Solr to create the cores in the default way.  Shut down Solr
> and edit the core.properties file for each core that you want to have
> the data in a different location.  Add a dataDir property pointing at
> the new location for that core's data.  If the core actually has any
> contents, you can move the data to that location, but if not, you can
> simply let Solr create the data itself when it starts back up.
>
> The core.properties file is in Java properties format, which is well
> documented in multiple places around the Internet.
>
> https://www.google.com/search?q=java+properties+format=utf-8=utf-8
>
> If the dataDir location is not an absolute path, then it will be
> relative to the instanceDir -- the place where core.properties is.  The
> dataDir value defaults to a simple value of "data".
>
> Thanks,
> Shawn
>


Re: Tlogs not being deleted/truncated

2017-07-12 Thread Webster Homer
We have buffers disabled as described in the CDCR documentation. We also
have autoCommit set for hard commits, but openSearcher false. We also have
autoSoftCommit set.


On Tue, Jul 11, 2017 at 5:00 PM, Xie, Sean  wrote:

> Please see my previous thread. I have to disable buffer on source cluster
> and a scheduled hard commit with scheduled logscheduler to make it work.
>
>
> -- Thank you
> Sean
>
> From: jmyatt >
> Date: Tuesday, Jul 11, 2017, 1:56 PM
> To: solr-user@lucene.apache.org >
> Subject: [EXTERNAL] Re: Tlogs not being deleted/truncated
>
> another interesting clue in my case (different from what WebsterHomer is
> seeing): the response from /cdcr?action=QUEUES reflects what I would expect
> to see in the tlog directory but it's not accurate.  By that I mean
> tlogTotalSize shows 1500271 (bytes) and tlogTotalCount shows 2.  This
> changes as more updates come in and autoCommit runs - sometimes
> tlogTotalCount is 1 instead of 2, and the tlogTotalSize changes but stays
> in
> that low range.
>
> But on the filesystem, all the tlogs are still there.  Perhaps the ignored
> exception noted above is in fact a problem?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Tlogs-not-being-deleted-truncated-tp4341958p4345477.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
> Confidentiality Notice::  This email, including attachments, may include
> non-public, proprietary, confidential or legally privileged information.
> If you are not an intended recipient or an authorized agent of an intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of the information contained in or transmitted with this e-mail is
> unauthorized and strictly prohibited.  If you have received this email in
> error, please notify the sender by replying to this message and permanently
> delete this e-mail, its attachments, and any copies of it immediately.  You
> should not retain, copy or use this e-mail or any attachment for any
> purpose, nor disclose all or any part of the contents to any other person.
> Thank you.
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: Auto commit Error - Solr Cloud 6.6.0 with HDFS

2017-07-12 Thread Shawn Heisey
On 7/12/2017 7:14 AM, Joe Obernberger wrote:
> Started up a 6.6.0 solr cloud instance running on 45 machines
> yesterday using HDFS (managed schema in zookeeper) and began
> indexing.  This error occurred on several of the nodes:

> Caused by: org.apache.solr.common.SolrException: openNewSearcher
> called on closed core

There's the important part of the error.

For some reason, which is not immediately clear from the information
provided, the core is closed.  In that situation, Solr is not able to
open a new searcher, so this error happens.  Do you have any other WARN
or ERROR messages in solr.log before this error?  You might want to find
a way to share an entire logfile and provide a URL for accessing it.

Thanks,
Shawn



Re: Is JSON facet output removing characters like \t from output

2017-07-12 Thread Susheel Kumar
I checked on 6.6 and don't see any such issues. I assume the field you are
bucketing on is string/keywordtokenizer not text/analyzed field.


===

"facets":{

"count":5,

"myfacet":{

  "buckets":[{

  "val":"A\t\t\t",

  "count":2},

{

  "val":"L\t\t\t",

  "count":1},

{

  "val":"P\t\t\t",

  "count":1},

{

  "val":"Z\t\t\t",

  "count":1}]}}}

On Wed, Jul 12, 2017 at 2:31 AM, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> Would like to check, does JSON facet output remove characters like \t from
> its output?
>
> Currently, we found that if the result is not in the last result set, the
> characters like \t will be removed from the output. However, if it is the
> last result set, the \t will not be removed.
>
> As there is discrepancy in the results being returned, is this considered a
> bug in the output of the JSON facet?
>
> I'm using Solr 6.5.1.
>
> Snapshot of output when \t is not removed:
>
>   "description":{
> "buckets":[{
>"val":"detaildescription\t\t\t\t",
> "count":1}]},
>
> Snapshot of output when \t is removed:
>
>   "description":{
> "buckets":[{
>"val":"detaildescription",
> "count":1}]},
>
> Regards,
> Edwin
>


Re: Using HTTP and HTTPS at the same time

2017-07-12 Thread Shawn Heisey
On 7/12/2017 7:20 AM, Nawab Zada Asad Iqbal wrote:
> I am  wondering what is wrong if I pass both http and https port to
> underlying jetty sever , won't that be enough to have both http and https 
> access to solr ?

Jetty should be capable of doing both HTTP and HTTPS (on different
ports), but the instructions that the Solr project provides for SSL do
not set things up that way.

If you know how to configure Jetty, then you can do anything you want,
but the only way of doing SSL that is supported by the Solr project is
the method that disables HTTP.

The reason that we don't support both at the same time is that the
entire reason for enabling SSL is for security purposes.  Leaving HTTP
open defeats that goal.

In my opinion, the best way to secure Solr is to make sure it cannot be
reached from unauthorized locations.  In particular, having Solr
accessible from the open Internet is dangerous.  If Solr is only
reachable from specific authorized network addresses, then you do not
need encryption or authentication.

Thanks,
Shawn



Re: Placing different collections on different hard disk/folder

2017-07-12 Thread Shawn Heisey
On 7/12/2017 12:38 AM, Zheng Lin Edwin Yeo wrote:
> I found that we can set the path under  in solrconfig.xml
>
> However, this seems to work only if there is one replica. How do we set it
> if we have 2 or more replica?

Setting dataDir in solrconfig.xml is something that really only works in
standalone Solr.  For SolrCloud, this method has issues that are
difficult to get around.

Another option that works in ANY Solr mode is changing dataDir in the
core.properties file that every core uses.  Create the collection,
allowing Solr to create the cores in the default way.  Shut down Solr
and edit the core.properties file for each core that you want to have
the data in a different location.  Add a dataDir property pointing at
the new location for that core's data.  If the core actually has any
contents, you can move the data to that location, but if not, you can
simply let Solr create the data itself when it starts back up.

The core.properties file is in Java properties format, which is well
documented in multiple places around the Internet.

https://www.google.com/search?q=java+properties+format=utf-8=utf-8

If the dataDir location is not an absolute path, then it will be
relative to the instanceDir -- the place where core.properties is.  The
dataDir value defaults to a simple value of "data".

Thanks,
Shawn



Re: Using HTTP and HTTPS at the same time

2017-07-12 Thread Nawab Zada Asad Iqbal
Thanks rick

I am  wondering what is wrong if I pass both http and https port to
underlying jetty sever , won't that be enough to have both http and https
access to solr ?

Regards
Nawab

On Wed, Jul 12, 2017 at 3:39 AM Rick Leir  wrote:

> Hi all,
> The recommended best practice is to run a web app in front of Solr, and
> maybe there is no benefit in SSL between the web app and Solr. In any case,
> if SSL is desired, you would configure the web app to always use HTTPS.
>
> Without the web app, you can have Apache promote a connection from http to
> https. (Is 'promote' the right term?) Cheers -- Rick
>
> On July 11, 2017 6:09:42 PM EDT, Nawab Zada Asad Iqbal 
> wrote:
> >Hi,
> >
> >I am reading a comment on
> >https://cwiki.apache.org/confluence/display/solr/Enabling+SSL which
> >says.
> >Just wanted to check if this is still the same with 6.5? This used to
> >work
> >in 4.5.
> >Shalin Shekhar Mangar
> >
> >
> >Solr does not support both HTTP and HTTPS at the same time. You can
> >only
> >use one of them at a time.
> >
> >
> >Thanks
> >
> >Nawab
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com


Auto commit Error - Solr Cloud 6.6.0 with HDFS

2017-07-12 Thread Joe Obernberger
Started up a 6.6.0 solr cloud instance running on 45 machines yesterday 
using HDFS (managed schema in zookeeper) and began indexing.  This error 
occurred on several of the nodes:


auto commit error...:org.apache.solr.common.SolrException: Error opening 
new searcher

at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2069)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2189)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:667)

at org.apache.solr.update.CommitTracker.run(CommitTracker.java:217)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.common.SolrException: openNewSearcher called 
on closed core

at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2056)
... 10 more
7/12/2017, 8:53:53 AM

Immediately following this error, the log shows another:

auto commit error...:org.apache.solr.common.SolrException: 
openNewSearcher called on closed core

at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1943)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:678)

at org.apache.solr.update.CommitTracker.run(CommitTracker.java:217)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:748)

Any ideas on what the problem could be?  Thank you!

-Joe



Unable to integrate OpenNLP with Solr

2017-07-12 Thread Sweta Parekh
Hi All,

We are using Solr 6.6 and trying to integrate OpenNLP but are unable to do that 
using patch lucene-2899. Can someone help or provide instructions for the same


Regards,
Sweta Parekh


Re: Using HTTP and HTTPS at the same time

2017-07-12 Thread Rick Leir
Hi all,
The recommended best practice is to run a web app in front of Solr, and maybe 
there is no benefit in SSL between the web app and Solr. In any case, if SSL is 
desired, you would configure the web app to always use HTTPS. 

Without the web app, you can have Apache promote a connection from http to 
https. (Is 'promote' the right term?) Cheers -- Rick

On July 11, 2017 6:09:42 PM EDT, Nawab Zada Asad Iqbal  wrote:
>Hi,
>
>I am reading a comment on
>https://cwiki.apache.org/confluence/display/solr/Enabling+SSL which
>says.
>Just wanted to check if this is still the same with 6.5? This used to
>work
>in 4.5.
>Shalin Shekhar Mangar
>
>
>Solr does not support both HTTP and HTTPS at the same time. You can
>only
>use one of them at a time.
>
>
>Thanks
>
>Nawab

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Collections API Overseer Status

2017-07-12 Thread alessandro.benedetti
+1 
I was trying to understand a reload collection time out happening lately in
a Solr Cloud cluster, and the Overseer Status was hard to decipher.

More Human Readable names and some additional documentation could help here.

Cheers



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collections-API-Overseer-Status-tp4345454p4345567.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Placing different collections on different hard disk/folder

2017-07-12 Thread Zheng Lin Edwin Yeo
I found that we can set the path under  in solrconfig.xml

However, this seems to work only if there is one replica. How do we set it
if we have 2 or more replica?

Regards,
Edwin


On 6 July 2017 at 11:34, Zheng Lin Edwin Yeo  wrote:

> Hi,
>
> Would like to check, how can we place the indexed files of different
> collections on different hard disk/folder, but they are in the same node?
>
> For example, I want collection1 to be placed in C: drive, collection2 to
> be placed in D: drive, and collection3 to be placed in E: drive.
>
> I am using Solr 6.5.1
>
> Regards,
> Edwin
>


Is JSON facet output removing characters like \t from output

2017-07-12 Thread Zheng Lin Edwin Yeo
Hi,

Would like to check, does JSON facet output remove characters like \t from
its output?

Currently, we found that if the result is not in the last result set, the
characters like \t will be removed from the output. However, if it is the
last result set, the \t will not be removed.

As there is discrepancy in the results being returned, is this considered a
bug in the output of the JSON facet?

I'm using Solr 6.5.1.

Snapshot of output when \t is not removed:

  "description":{
"buckets":[{
   "val":"detaildescription\t\t\t\t",
"count":1}]},

Snapshot of output when \t is removed:

  "description":{
"buckets":[{
   "val":"detaildescription",
"count":1}]},

Regards,
Edwin