Re: Alias field names when searching (not for results)

2018-03-05 Thread Emir Arnautović
Hi,
I did not try it, but the first thing that came to my mind is to use edismax’s 
ability to define field aliases, something like f.f1.fq=field_1. Note that it 
is not recommended to have field name starting with number so not sure if it 
will work with “1”.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 5 Mar 2018, at 17:51, Christopher Schultz  
> wrote:
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
> 
> All,
> 
> I'd like for users to be able to search a field by multiple names
> without performing a "copy-field" when analyzing a document. Is that
> possible? Whenever I search for "solr alias field" I get results about
> how to re-name fields in the results.
> 
> Here's what I'd like to do. Let's say I have a document:
> 
> {
>  id: 1234,
>  field_1: valueA,
>  field_2: valueB,
>  field_3: valueC
> }
> 
> I'd like users to be able to find this document using any of the
> following queries:
> 
>   field_1:valueA
>   f1:valueA
>   1:valueA
> 
> I just want the query parser to say "oh, 'f1' is an alias for
> 'field_1'" and substitute that when performing the search. Is that
> possible?
> 
> - -chris
> 
> -BEGIN PGP SIGNATURE-
> Comment: GPGTools - http://gpgtools.org
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> 
> iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqddZMdHGNocmlzQGNo
> cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFgbFg/9HIgJgX4Lib2X4XYU
> P2F4uW9TyDWtp6mA9xsfdYxRNe4K3yFPbkUUwJW2MI2V62SR6apB+TghOMqbmCD/
> gaQ0CFWgLsn5Egulj2taUN+MAYD/4GMO9ltyXNc2g9siSMIDUS5N09fwJbxfBXrP
> SPvSQqUOVD5wKCgoCXCVd+RM+SEClX4k1ZuWDbVAiO4YPpJwFy6+BN2uTCaqP3Ll
> XOqn+/6ejnPCcvoQrTlE1/DiBTUti8H7V0LOjzEZns8YqZOAH+pAVxYRRQM5UzZS
> pUBGpHokoaZ0tMf/aCmHp5pI5VWrxrXcS47csBRvoAn8Z7uRxH8p0wYE8BkGs2rw
> dEzOSOKdhma11ZDkWKg2/sBw8v9swyWy9W3MuA0tqYzfZicsXT2GBHzyPDsqabDq
> mBPWuxUdqZEaz+fE8SRsW84ELcqe1fbltscng/ZhNRkLOtmn6aeMc+XABhpcVE7o
> Rfodl/PrQetgzZ4WLyzb7m2bz2w38x6WSPhuQIZHVrHNoCXG+gWY3zMxF6EBEFCV
> CJvsXaQ1ZpGLjO/uCXJ9iHKxsSoUzWap9qws82xH3QJ52Q7vCoxF5G/2MZWvvgje
> +MsZbh8L5D0HBM1jTKWx3X+r3FbdURu6P8yUFD/Hywy2J/jev1MiU4Zh3Yw+JByo
> mR8TdvleHAHfA01tArVgk2yscqI=
> =44DX
> -END PGP SIGNATURE-



Re: Atomic updates using solr-php-client

2018-03-05 Thread Sami al Subhi
Thank you for replying,

Yes that is the one. Unfortunately there is no documentation for this
library.

I tried to implement other libraries but I couldn't get them running. This
is the easiest library to implement but lacks support and documentation.

Thank you and best regards,
Sami



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Atomic updates using solr-php-client

2018-03-05 Thread Shawn Heisey

On 3/5/2018 11:38 PM, Sami al Subhi wrote:

I am using solr-php-client to interact with solr. I came this example that
shows how to add docs:
http://www.ayalon.ch/en/code-samples/solr-php-client-example
  I checked
"service.php" file, more specifically the function "_documentToXmlFragment"
and it does not seem it supports "atomic updates".am I correct? is the only
way is to edit "_documentToXmlFragment" to support updates?


The php clients for Solr are third-party software.  None of them were 
created by the project.


To get help with that software, you need to contact its authors.  It 
seems that there are several codebases with the name "solr-php-client" 
so I do not know which one is the one that you are using.  This MIGHT be 
the correct project, but I can't be sure:


https://github.com/PTCInc/solr-php-client

You'll probably need to examine the library that you're using to see if 
there are hints about where you can get support.


There are a fair number of php clients for Solr.  If that one is not 
doing the job, maybe you can switch to one of the others.


Thanks,
Shawn



Atomic updates using solr-php-client

2018-03-05 Thread Sami al Subhi
I am using solr-php-client to interact with solr. I came this example that
shows how to add docs:
http://www.ayalon.ch/en/code-samples/solr-php-client-example
  I checked
"service.php" file, more specifically the function "_documentToXmlFragment"
and it does not seem it supports "atomic updates".am I correct? is the only
way is to edit "_documentToXmlFragment" to support updates?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Rename solr to another name

2018-03-05 Thread Zheng Lin Edwin Yeo
Hi Shawn,

Ok, thanks for the info.
We will review it.

Regards,
Edwin

On 5 March 2018 at 14:01, Shawn Heisey  wrote:

> On 3/4/2018 9:35 PM, Zheng Lin Edwin Yeo wrote:
>
>> Just to check, is solr.xml hard-coded at the *SolrXmlConfig.java* ?
>>
>> Which Jar file is this class located at, and do we need to recompile it so
>> that it can take on the new name?
>>
>
> I'm pretty sure that is indeed hard-coded in the java code.
>
> I think I'm beginning to see a pattern.  It sounds like you want to
> install Solr without anyone knowing that it's Solr you've installed.
>
> I wish you luck with that.  It's going to take a very large amount of
> effort to eliminate all traces of the software name. And if you ever
> upgrade it, you're going to have to do it all again on the new version.  If
> anyone looks at the installation and has had any experience with Solr at
> all, they are going to know what software it is, even if you remove all
> mention of the name.
>
> To accomplish that goal, obtain the services of somebody who's very
> talented at system administration and Java programming.  Expect to pay them
> well.  Walter is right -- nobody here is interested in doing the job you
> want done for free via email.
>
> Thanks,
> Shawn
>
>


Re: statistics in hitlist

2018-03-05 Thread Joel Bernstein
I suspect you've got nulls in your data. I just tested with null values and
got the same error. For testing purposes try loading the data with default
values of zero.


Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Mar 5, 2018 at 10:12 PM, Joel Bernstein  wrote:

> Let's break the expression down and build it up slowly. Let's start with:
>
> let(echo="true",
>  a=random(tx_prod_production, q="*:*", fq="isParent:true", rows="15",
> fl="oil_first_90_days_production,oil_last_30_days_production"),
>  b=col(a, oil_first_90_days_production))
>
>
> This should return variables a and b. Let's see what the data looks like.
> I changed the rows from 15 to 15000. If it all looks good we can expand the
> rows and continue adding functions.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, Mar 5, 2018 at 4:11 PM, John Smith  wrote:
>
>> Thanks Joel for your help on this.
>>
>> What I've done so far:
>> - unzip downloaded solr-7.2
>> - modify the _default "managed-schema" to add the random field type and
>> the dynamic random field
>> - start solr7 using "solr start -c"
>> - indexed my data using pint/pdouble/boolean field types etc
>>
>> I can now run the random function all by itself, it returns random
>> results as expected. So far so good!
>>
>> However... now trying to get the regression stuff working:
>>
>> let(a=random(tx_prod_production, q="*:*", fq="isParent:true",
>> rows="15000", fl="oil_first_90_days_producti
>> on,oil_last_30_days_production"),
>> b=col(a, oil_first_90_days_production),
>> c=col(a, oil_last_30_days_production),
>> d=regress(b, c))
>>
>> Posted directly into solr admin UI. Run the streaming expression and I
>> get this error message:
>> "EXCEPTION": "Failed to evaluate expression regress(b,c) - Numeric value
>> expected but found type java.lang.String for value
>> oil_first_90_days_production"
>>
>> It thinks my numeric field is defined as a string? But when I view the
>> schema, those 2 fields are defined as ints:
>>
>>
>> When I run a normal query and choose xml as output format, then it also
>> puts "int" elements into the hitlist, so the schema appears to be correct
>> it's just when using this regress function that something goes wrong and
>> solr thinks the field is string.
>>
>> Any suggestions?
>> Thanks!
>> ​
>>
>>
>> On Thu, Mar 1, 2018 at 9:12 PM, Joel Bernstein 
>> wrote:
>>
>>> The field type will also need to be in the schema:
>>>
>>>  
>>>
>>> 
>>>
>>>
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>>
>>> On Thu, Mar 1, 2018 at 8:00 PM, Joel Bernstein 
>>> wrote:
>>>
>>> > You'll need to have this field in your schema:
>>> >
>>> > 
>>> >
>>> > I'll check to see if the default schema used with solr start -c has
>>> this
>>> > field, if not I'll add it. Thanks for pointing this out.
>>> >
>>> > I checked and right now the random expression is only accepting one fq,
>>> > but I consider this a bug. It should accept multiple. I'll create
>>> ticket
>>> > for getting this fixed.
>>> >
>>> >
>>> >
>>> > Joel Bernstein
>>> > http://joelsolr.blogspot.com/
>>> >
>>> > On Thu, Mar 1, 2018 at 4:55 PM, John Smith 
>>> wrote:
>>> >
>>> >> Joel, thanks for the pointers to the streaming feature. I had no idea
>>> solr
>>> >> had that (and also just discovered the very intersting sql feature! I
>>> will
>>> >> be sure to investigate that in more detail in the future).
>>> >>
>>> >> However I'm having some trouble getting basic streaming functions
>>> working.
>>> >> I've already figured out that I had to move to "solr cloud" instead of
>>> >> "solr standalone" because I was getting errors about "cannot find zk
>>> >> instance" or whatever which went away when using "solr start -c"
>>> instead.
>>> >>
>>> >> But now I'm trying to use the random function since that was one of
>>> the
>>> >> functions used in your example.
>>> >>
>>> >> random(tx_header, q="*:*", rows="100", fl="countyname")
>>> >>
>>> >> I posted that directly in the "stream" section of the solr admin UI.
>>> This
>>> >> is all on linux, with solr 7.1.0 and 7.2.1 (tried several versions in
>>> case
>>> >> it was a bug in one)
>>> >>
>>> >> I get back an error message:
>>> >> *sort param could not be parsed as a query, and is not a field that
>>> exists
>>> >> in the index: random_-255009774*
>>> >>
>>> >> I'm not passing in any sort field anywhere. But the solr logs show
>>> these
>>> >> three log entries:
>>> >>
>>> >> 2018-03-01 21:41:18.954 INFO  (qtp257513673-21) [c:tx_header s:shard1
>>> >> r:core_node2 x:tx_header_shard1_replica_n1] o.a.s.c.S.Request
>>> >> [tx_header_shard1_replica_n1]  webapp=/solr path=/select
>>> >> params={q=*:*&_stateVer_=tx_header:6=countyname
>>> >> *=random_-255009774+asc*=100=javabin=2}
>>> status=400
>>> >> QTime=19
>>> >>
>>> >> 2018-03-01 21:41:18.966 ERROR (qtp257513673-17) [c:tx_header s:shard1
>>> >> r:core_node2 

Re: statistics in hitlist

2018-03-05 Thread Joel Bernstein
Let's break the expression down and build it up slowly. Let's start with:

let(echo="true",
 a=random(tx_prod_production, q="*:*", fq="isParent:true", rows="15",
fl="oil_first_90_days_production,oil_last_30_days_production"),
 b=col(a, oil_first_90_days_production))


This should return variables a and b. Let's see what the data looks like. I
changed the rows from 15 to 15000. If it all looks good we can expand the
rows and continue adding functions.




Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Mar 5, 2018 at 4:11 PM, John Smith  wrote:

> Thanks Joel for your help on this.
>
> What I've done so far:
> - unzip downloaded solr-7.2
> - modify the _default "managed-schema" to add the random field type and
> the dynamic random field
> - start solr7 using "solr start -c"
> - indexed my data using pint/pdouble/boolean field types etc
>
> I can now run the random function all by itself, it returns random results
> as expected. So far so good!
>
> However... now trying to get the regression stuff working:
>
> let(a=random(tx_prod_production, q="*:*", fq="isParent:true",
> rows="15000", fl="oil_first_90_days_production,oil_last_30_days_
> production"),
> b=col(a, oil_first_90_days_production),
> c=col(a, oil_last_30_days_production),
> d=regress(b, c))
>
> Posted directly into solr admin UI. Run the streaming expression and I get
> this error message:
> "EXCEPTION": "Failed to evaluate expression regress(b,c) - Numeric value
> expected but found type java.lang.String for value
> oil_first_90_days_production"
>
> It thinks my numeric field is defined as a string? But when I view the
> schema, those 2 fields are defined as ints:
>
>
> When I run a normal query and choose xml as output format, then it also
> puts "int" elements into the hitlist, so the schema appears to be correct
> it's just when using this regress function that something goes wrong and
> solr thinks the field is string.
>
> Any suggestions?
> Thanks!
> ​
>
>
> On Thu, Mar 1, 2018 at 9:12 PM, Joel Bernstein  wrote:
>
>> The field type will also need to be in the schema:
>>
>>  
>>
>> 
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Thu, Mar 1, 2018 at 8:00 PM, Joel Bernstein 
>> wrote:
>>
>> > You'll need to have this field in your schema:
>> >
>> > 
>> >
>> > I'll check to see if the default schema used with solr start -c has this
>> > field, if not I'll add it. Thanks for pointing this out.
>> >
>> > I checked and right now the random expression is only accepting one fq,
>> > but I consider this a bug. It should accept multiple. I'll create ticket
>> > for getting this fixed.
>> >
>> >
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>> >
>> > On Thu, Mar 1, 2018 at 4:55 PM, John Smith 
>> wrote:
>> >
>> >> Joel, thanks for the pointers to the streaming feature. I had no idea
>> solr
>> >> had that (and also just discovered the very intersting sql feature! I
>> will
>> >> be sure to investigate that in more detail in the future).
>> >>
>> >> However I'm having some trouble getting basic streaming functions
>> working.
>> >> I've already figured out that I had to move to "solr cloud" instead of
>> >> "solr standalone" because I was getting errors about "cannot find zk
>> >> instance" or whatever which went away when using "solr start -c"
>> instead.
>> >>
>> >> But now I'm trying to use the random function since that was one of the
>> >> functions used in your example.
>> >>
>> >> random(tx_header, q="*:*", rows="100", fl="countyname")
>> >>
>> >> I posted that directly in the "stream" section of the solr admin UI.
>> This
>> >> is all on linux, with solr 7.1.0 and 7.2.1 (tried several versions in
>> case
>> >> it was a bug in one)
>> >>
>> >> I get back an error message:
>> >> *sort param could not be parsed as a query, and is not a field that
>> exists
>> >> in the index: random_-255009774*
>> >>
>> >> I'm not passing in any sort field anywhere. But the solr logs show
>> these
>> >> three log entries:
>> >>
>> >> 2018-03-01 21:41:18.954 INFO  (qtp257513673-21) [c:tx_header s:shard1
>> >> r:core_node2 x:tx_header_shard1_replica_n1] o.a.s.c.S.Request
>> >> [tx_header_shard1_replica_n1]  webapp=/solr path=/select
>> >> params={q=*:*&_stateVer_=tx_header:6=countyname
>> >> *=random_-255009774+asc*=100=javabin=2}
>> status=400
>> >> QTime=19
>> >>
>> >> 2018-03-01 21:41:18.966 ERROR (qtp257513673-17) [c:tx_header s:shard1
>> >> r:core_node2 x:tx_header_shard1_replica_n1] o.a.s.c.s.i.CloudSolrClient
>> >> Request to collection [tx_header] failed due to (400)
>> >> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
>> >> Error
>> >> from server at http://192.168.13.31:8983/solr/tx_header: sort param
>> could
>> >> not be parsed as a query, and is not a field that exists in the index:
>> >> random_-255009774, retry? 0
>> >>
>> >> 2018-03-01 21:41:18.968 ERROR (qtp257513673-17) [c:tx_header 

Re: Issues with CDCR in Solr 7.1

2018-03-05 Thread Tom Peters
You can ignore this. I think I found the issue (I was missing a block of XML in 
the source ocnfig). I'm going to monitor it over the next day and see if it was 
resolved.

> On Mar 5, 2018, at 4:29 PM, Tom Peters  wrote:
> 
> I'm trying to get Solr CDCR setup in Solr 7.1 and I'm having issues 
> post-bootstrap.
> 
> I have about 5,572,933 documents in the source cluster (index size is 3.77 
> GB). I'm enabling CDCR in the following manner:
> 
> 1. Delete the existing cluster in the target data center
>   admin/collections?action=DELETE=mycollection
> 
> 2. Stop indexing in source data center
> 
> 3. Do one final hard commit in source data center
>   update -d '{"commit":{}}'
> 
> 4. Create the cluster in the target datacenter
>   
> admin/collections?action=CREATE=mycollection=1=myconfig
> 
>   Note: I'm only creating one replica initially because there is a bug 
> that prevents the bootstrap index from replicating to the replicas
> 
> 5. Disable the buffer in the target data center
>   cdcr?action=DISABLEBUFFER
> 
>   Note: the buffer has already been disabled in the source
> 
> 6. Start CDCR in the source data center
>   cdcr?action=START
> 
> 7. Monitor cdcr?action=BOOTSTRAP_STATUS and wait for complete message
>   NOTE: At this point I can confirm that the documents count in both the 
> source and target data centers are identical
> 
> 8. Re-enable indexing on source
> 
> 
> I'm not seeing any new documents in the target cluster, even after a commit. 
> The document count in the target does change, but it's nothing new. Looking 
> at the logs, I do see plenty of messages like:
>   SOURCE:
> 2018-03-05 21:20:06.290 INFO (qtp1595212853-65472) [c:mycollection 
> s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] o.a.s.c.S.Request 
> [mycollection_shard1_replica_n6] webapp=/solr path=/cdcr 
> params={action=LASTPROCESSEDVERSION=javabin=2} status=0 QTime=0
> 2018-03-05 21:20:06.430 INFO 
> (cdcr-replicator-79-thread-2-processing-n:solr2-a:8080_solr) [ ] 
> o.a.s.h.CdcrReplicator Forwarded 128 updates to target mycollection
> 
>   TARGET:
> 2018-03-05 21:19:38.637 INFO (qtp1595212853-134) [c:mycollection 
> s:shard1 r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request 
> [mycollection_shard1_replica_n1] webapp=/solr path=/update 
> params={_stateVer_=mycollection:52&_version_=-1593959559286751241==javabin=2}
>  status=0 QTime=0
> 
> 
> The weird thing though is that the lastTimestamp is from a couple days ago 
> when I query cdcr?action=QUEUES
> 
> {
>  "responseHeader": {
>"status": 0,
>"QTime": 24
>  },
>  "queues": [
>"zook01.be,zook02.be,zook03.be/solr",
>[
>  "mycollection",
>  [
>"queueSize",
>8685952,
>"lastTimestamp",
>"2018-03-03T23:07:14.179Z"
>  ]
>]
>  ],
>  "tlogTotalSize": 3458777355,
>  "tlogTotalCount": 5226,
>  "updateLogSynchronizer": "stopped"
> }
> 
> 
> Ultimately my questions are:
> 
> 1. Why am I not seeing updates in the target datacenter after bootstrapping 
> has completed?
> 
> 2. Is there anything I need to do to "reset" the bootstrap if I blow away the 
> target data center and start from scratch again.
> 
> 3. Am I missing anything?
> 
> Thanks for taking the time to read this.
> 
> 
> This message and any attachment may contain information that is confidential 
> and/or proprietary. Any use, disclosure, copying, storing, or distribution of 
> this e-mail or any attached file by anyone other than the intended recipient 
> is strictly prohibited. If you have received this message in error, please 
> notify the sender by reply email and delete the message and any attachments. 
> Thank you.



This message and any attachment may contain information that is confidential 
and/or proprietary. Any use, disclosure, copying, storing, or distribution of 
this e-mail or any attached file by anyone other than the intended recipient is 
strictly prohibited. If you have received this message in error, please notify 
the sender by reply email and delete the message and any attachments. Thank you.


Re: Solr 7.2.0 CDCR Issue with TLOG collections

2018-03-05 Thread Webster Homer
I noticed that the cdcr action=queues returns different results for the
target clouds. One target says that the  updateLogSynchronizer  is stopped
the other says started. Why? What does that mean. We don't explicitly set
that anywhere


{"responseHeader": {"status": 0,"QTime": 0},"queues": [],"tlogTotalSize": 0,
"tlogTotalCount": 0,"updateLogSynchronizer": "stopped"}

and the other

{"responseHeader": {"status": 0,"QTime": 0},"queues": [],"tlogTotalSize":
22254206389,"tlogTotalCount": 2,"updateLogSynchronizer": "started"}

The source is as follows:
{
"responseHeader": {
"status": 0,
"QTime": 5
},
"queues": [
"xxx-mzk01.sial.com:2181,xxx-mzk02.sial.com:2181,
xxx-mzk03.sial.com:2181/solr",
[
"b2b-catalog-material-180124T",
[
"queueSize",
0,
"lastTimestamp",
"2018-02-28T18:34:39.704Z"
]
],
"yyy-mzk01.sial.com:2181,yyy-mzk02.sial.com:2181,
yyy-mzk03.sial.com:2181/solr",
[
"b2b-catalog-material-180124T",
[
"queueSize",
0,
"lastTimestamp",
"2018-02-28T18:34:39.704Z"
]
]
],
"tlogTotalSize": 1970848,
"tlogTotalCount": 1,
"updateLogSynchronizer": "stopped"
}


On Fri, Mar 2, 2018 at 5:05 PM, Webster Homer 
wrote:

> It looks like the data is getting to the target servers. I see tlog files
> with the right timestamps. Looking at the timestamps on the documents in
> the collection none of the data appears to have been loaded.
> In the solr.log I see lots of /cdcr messages  action=LASTPROCESSEDVERSION,
>  action=COLLECTIONCHECKPOINT, and  action=SHARDCHECKPOINT
>
> no errors
>
> autoCommit is set to  6 I tried sending a commit explicitly no
> difference. cdcr is uploading data, but no new data appears in the
> collection.
>
> On Fri, Mar 2, 2018 at 1:39 PM, Webster Homer 
> wrote:
>
>> We have been having strange behavior with CDCR on Solr 7.2.0.
>>
>> We have a number of replicas which have identical schemas. We found that
>> TLOG replicas give much more consistent search results.
>>
>> We created a collection using TLOG replicas in our QA clouds.
>> We have a locally hosted solrcloud with 2 nodes, all our collections have
>> 2 shards. We use CDCR to replicate the collections from this environment to
>> 2 data centers hosted in Google cloud. This seems to work fairly well for
>> our collections with NRT replicas. However the new TLOG collection has
>> problems.
>>
>> The google cloud solrclusters have 4 nodes each (3 separate Zookeepers).
>> 2 shards per collection with 2 replicas per shard.
>>
>> We never see data show up in the cloud collections, but we do see tlog
>> files show up on the cloud servers. I can see that all of the servers have
>> cdcr started, buffers are disabled.
>> The cdcr source configuration is:
>>
>> "requestHandler":{"/cdcr":{
>>   "name":"/cdcr",
>>   "class":"solr.CdcrRequestHandler",
>>   "replica":[
>> {
>>   "zkHost":"xxx-mzk01.sial.com:2181,xxx-mzk02.sial.com:2181,xx
>> x-mzk03.sial.com:2181/solr",
>>   "source":"b2b-catalog-material-180124T",
>>   "target":"b2b-catalog-material-180124T"},
>> {
>>   "zkHost":"-mzk01.sial.com:2181,-mzk02.sial.com:2181,
>> -mzk03.sial.com:2181/solr",
>>   "source":"b2b-catalog-material-180124T",
>>   "target":"b2b-catalog-material-180124T"}],
>>   "replicator":{
>> "threadPoolSize":4,
>> "schedule":500,
>> "batchSize":250},
>>   "updateLogSynchronizer":{"schedule":6
>>
>> The target configurations in the 2 clouds are the same:
>> "requestHandler":{"/cdcr":{ "name":"/cdcr", "class":
>> "solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}}
>>
>> All of our collections have a timestamp field, index_date. In the source
>> collection all the records have a date of 2/28/2018 but the target
>> collections have a latest date of 1/26/2018
>>
>> I don't see cdcr errors in the logs, but we use logstash to search them,
>> and we're still perfecting that.
>>
>> We have a number of similar collections that behave correctly. This is
>> the only collection that is a TLOG collection. It appears that CDCR doesn't
>> support TLOG collections.
>>
>> This begins to look like a bug
>>
>>
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and 

Issues with CDCR in Solr 7.1

2018-03-05 Thread Tom Peters
I'm trying to get Solr CDCR setup in Solr 7.1 and I'm having issues 
post-bootstrap.

I have about 5,572,933 documents in the source cluster (index size is 3.77 GB). 
I'm enabling CDCR in the following manner:

1. Delete the existing cluster in the target data center
admin/collections?action=DELETE=mycollection

2. Stop indexing in source data center

3. Do one final hard commit in source data center
update -d '{"commit":{}}'

4. Create the cluster in the target datacenter

admin/collections?action=CREATE=mycollection=1=myconfig

Note: I'm only creating one replica initially because there is a bug 
that prevents the bootstrap index from replicating to the replicas

5. Disable the buffer in the target data center
cdcr?action=DISABLEBUFFER

Note: the buffer has already been disabled in the source

6. Start CDCR in the source data center
cdcr?action=START

7. Monitor cdcr?action=BOOTSTRAP_STATUS and wait for complete message
NOTE: At this point I can confirm that the documents count in both the 
source and target data centers are identical

8. Re-enable indexing on source


I'm not seeing any new documents in the target cluster, even after a commit. 
The document count in the target does change, but it's nothing new. Looking at 
the logs, I do see plenty of messages like:
SOURCE:
  2018-03-05 21:20:06.290 INFO (qtp1595212853-65472) [c:mycollection 
s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] o.a.s.c.S.Request 
[mycollection_shard1_replica_n6] webapp=/solr path=/cdcr 
params={action=LASTPROCESSEDVERSION=javabin=2} status=0 QTime=0
  2018-03-05 21:20:06.430 INFO 
(cdcr-replicator-79-thread-2-processing-n:solr2-a:8080_solr) [ ] 
o.a.s.h.CdcrReplicator Forwarded 128 updates to target mycollection

TARGET:
  2018-03-05 21:19:38.637 INFO (qtp1595212853-134) [c:mycollection 
s:shard1 r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request 
[mycollection_shard1_replica_n1] webapp=/solr path=/update 
params={_stateVer_=mycollection:52&_version_=-1593959559286751241==javabin=2}
 status=0 QTime=0


The weird thing though is that the lastTimestamp is from a couple days ago when 
I query cdcr?action=QUEUES

{
  "responseHeader": {
"status": 0,
"QTime": 24
  },
  "queues": [
"zook01.be,zook02.be,zook03.be/solr",
[
  "mycollection",
  [
"queueSize",
8685952,
"lastTimestamp",
"2018-03-03T23:07:14.179Z"
  ]
]
  ],
  "tlogTotalSize": 3458777355,
  "tlogTotalCount": 5226,
  "updateLogSynchronizer": "stopped"
}


Ultimately my questions are:

1. Why am I not seeing updates in the target datacenter after bootstrapping has 
completed?

2. Is there anything I need to do to "reset" the bootstrap if I blow away the 
target data center and start from scratch again.

3. Am I missing anything?

Thanks for taking the time to read this.


This message and any attachment may contain information that is confidential 
and/or proprietary. Any use, disclosure, copying, storing, or distribution of 
this e-mail or any attached file by anyone other than the intended recipient is 
strictly prohibited. If you have received this message in error, please notify 
the sender by reply email and delete the message and any attachments. Thank you.


Re: statistics in hitlist

2018-03-05 Thread John Smith
Thanks Joel for your help on this.

What I've done so far:
- unzip downloaded solr-7.2
- modify the _default "managed-schema" to add the random field type and the
dynamic random field
- start solr7 using "solr start -c"
- indexed my data using pint/pdouble/boolean field types etc

I can now run the random function all by itself, it returns random results
as expected. So far so good!

However... now trying to get the regression stuff working:

let(a=random(tx_prod_production, q="*:*", fq="isParent:true", rows="15000",
fl="oil_first_90_days_production,oil_last_30_days_production"),
b=col(a, oil_first_90_days_production),
c=col(a, oil_last_30_days_production),
d=regress(b, c))

Posted directly into solr admin UI. Run the streaming expression and I get
this error message:
"EXCEPTION": "Failed to evaluate expression regress(b,c) - Numeric value
expected but found type java.lang.String for value
oil_first_90_days_production"

It thinks my numeric field is defined as a string? But when I view the
schema, those 2 fields are defined as ints:


When I run a normal query and choose xml as output format, then it also
puts "int" elements into the hitlist, so the schema appears to be correct
it's just when using this regress function that something goes wrong and
solr thinks the field is string.

Any suggestions?
Thanks!
​


On Thu, Mar 1, 2018 at 9:12 PM, Joel Bernstein  wrote:

> The field type will also need to be in the schema:
>
>  
>
> 
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Mar 1, 2018 at 8:00 PM, Joel Bernstein  wrote:
>
> > You'll need to have this field in your schema:
> >
> > 
> >
> > I'll check to see if the default schema used with solr start -c has this
> > field, if not I'll add it. Thanks for pointing this out.
> >
> > I checked and right now the random expression is only accepting one fq,
> > but I consider this a bug. It should accept multiple. I'll create ticket
> > for getting this fixed.
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Thu, Mar 1, 2018 at 4:55 PM, John Smith  wrote:
> >
> >> Joel, thanks for the pointers to the streaming feature. I had no idea
> solr
> >> had that (and also just discovered the very intersting sql feature! I
> will
> >> be sure to investigate that in more detail in the future).
> >>
> >> However I'm having some trouble getting basic streaming functions
> working.
> >> I've already figured out that I had to move to "solr cloud" instead of
> >> "solr standalone" because I was getting errors about "cannot find zk
> >> instance" or whatever which went away when using "solr start -c"
> instead.
> >>
> >> But now I'm trying to use the random function since that was one of the
> >> functions used in your example.
> >>
> >> random(tx_header, q="*:*", rows="100", fl="countyname")
> >>
> >> I posted that directly in the "stream" section of the solr admin UI.
> This
> >> is all on linux, with solr 7.1.0 and 7.2.1 (tried several versions in
> case
> >> it was a bug in one)
> >>
> >> I get back an error message:
> >> *sort param could not be parsed as a query, and is not a field that
> exists
> >> in the index: random_-255009774*
> >>
> >> I'm not passing in any sort field anywhere. But the solr logs show these
> >> three log entries:
> >>
> >> 2018-03-01 21:41:18.954 INFO  (qtp257513673-21) [c:tx_header s:shard1
> >> r:core_node2 x:tx_header_shard1_replica_n1] o.a.s.c.S.Request
> >> [tx_header_shard1_replica_n1]  webapp=/solr path=/select
> >> params={q=*:*&_stateVer_=tx_header:6=countyname
> >> *=random_-255009774+asc*=100=javabin=2} status=400
> >> QTime=19
> >>
> >> 2018-03-01 21:41:18.966 ERROR (qtp257513673-17) [c:tx_header s:shard1
> >> r:core_node2 x:tx_header_shard1_replica_n1] o.a.s.c.s.i.CloudSolrClient
> >> Request to collection [tx_header] failed due to (400)
> >> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> >> Error
> >> from server at http://192.168.13.31:8983/solr/tx_header: sort param
> could
> >> not be parsed as a query, and is not a field that exists in the index:
> >> random_-255009774, retry? 0
> >>
> >> 2018-03-01 21:41:18.968 ERROR (qtp257513673-17) [c:tx_header s:shard1
> >> r:core_node2 x:tx_header_shard1_replica_n1]
> o.a.s.c.s.i.s.ExceptionStream
> >> java.io.IOException:
> >> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> >> Error
> >> from server at http://192.168.13.31:8983/solr/tx_header: sort param
> could
> >> not be parsed as a query, and is not a field that exists in the index:
> >> random_-255009774
> >>
> >>
> >> So basically it looks like solr is injecting the "sort=random_" stuff
> into
> >> my query and of course that is failing on the search since that
> >> field/column doesn't exist in my schema. Everytime I run the random
> >> function, I get a slightly different field name that it injects, but
> they
> >> all start with "random_" etc.
> >>
> >> I have tried 

Re: Performance Implications of Different Routing Schemes

2018-03-05 Thread Stephen Lewis
Hi!

Thank you for your thoughtful response Shawn! I have a couple of follow up
questions and points to check my understanding on.

Thanks for explaining my misunderstanding on implicit routing. So to repeat
back and check my understanding: implicit routing may be either left up to
SOLR to distribute, or you may specify the router.field parameter at
collection creation time. If there is no router.field parameter specified,
SOLR will distribute documents to shards based solely on a hash of the
docID; if router.field is defined, SOLR will distribute documents to shards
based solely on the hash of the router.field value on the doc. Correct?

So let me focus a bit more on the composite ID router. The options
available here are:

   - single prepend routing (tenant1!id1)
   - multi prepend routing (tenant1!piece1!id1)
   - num-bits prepend routing (tenant1/B!id1)

I think the first two are relatively straight forward; the ask is on the
application layer to supply one or two prepends, and then SOLR will find an
appropriate shard to host the document based on a hash of the prepend(s).

I'm very interested though by the num-bits prepend. (By the way, I never
found an agreed-upon name for this, so let me know if there is something
standard I should call this). Originally when I wrote I had a
misunderstanding here, but I believe I've understood it full now. If B is
your "bits" param, then a tenant will be spread out over 1/(2^B) fraction
of the shards; so if B = 0, any shard may end up hosting the doc; if B = 1,
half of the shards may be the one to host the doc; if B = 2, one quarter of
the shards may be the one to host the doc; etc

I was still a bit uncertain about the mechanism until looked deeper into this
documentation

and this article
: "[The
composite ID router] creates a composite 32 bit hash by taking 16 bits from
the shard key’s hash and 16 bits from the document id’s hash When a
tenant is too large to fit on a single shard it can be spread across
multiple shards be specifying the number of bits to use from the shard key.
The syntax for this is: shard_key/num!document_id. The /num is the number
of bits from the shard key to use in the composite hash." So I think this
mean that first the hashes of each are computed, and then the bits are
taken from the resultant hash values. Is that correct? So I believe that
means that if num = 16, then that would be the same as omitting the /num
param. I believe it also implies that if the number of shards is not a
power of 2, some irregularity in number of shards will be experienced;
e.g., if there is a collection with 11 shards and num = 2; 11/2^2 = 2.75;
then every tenant will live on at least 3 shards, some may end up living on
4 depending on exactly how the ranges work out.

Some of what implicit routing offers could be quite desirable if the
document shard-routing ever needs to be updated. Specifically, the ID of
the document could remain constant even when making updates to the routing
key, which I believe would allow in-place updates to a document which
changes its host shard. So for example, if I create a collection using
implicit routing and router.field=shard_key, then I can insert a document
with id=1 and shard_key=1, then later insert a document with id=1 and
shard_key=2, and the original document sent with shard_key = 1 will be
automatically deleted on insert of the new document. Can you confirm
whether this is true? Or would the original document with id = 1 and
shard_key = 1 not necessarily be deleted?

The only drawback then I see of using implicit routing would be that there
is no equivalent to the num-bits prepend: if you want to spread out your
documents associated with a given tenant across fewer than all but more
than one shard, it falls back to the developer to update the shard key. Is
this correct, or is there an equivalent notion to num-bits for implicit
routing?

There's one other thing I want to drill down on a bit more with resource
usage:


>
>
> *Query performance should be about the same for any routing type.  It does
> look like when you use compositeId and actually implement shard keys, you
> can limit queries to those shards, but a *general* query is going to hit
> all shards.*
>

Currently I have experience targeting 1-shard at a time with queries. My
goal in architecture here is to be a little more flexible, and instead keep
the number of shards a given query has to hit approximately constant even
as the user base and solr cloud grow. I believe that will keep CPU sprawl
at a minimum, more on that below.

>
>
>
>
>
> * If your query rate is very low (or shards are distributed across a lot
> of hardware that has significant spare CPU capacity) performance isn't
> going to be dramatically different for a query that hits 2 shards versus
> one that hits 6 shards.  If your query rate is 

Re: /var/solr/data has lots of index* directories

2018-03-05 Thread Tom Peters
Thanks. I went ahead and did that.

I think the multiple directories stemmed from an issue I sent to the list a 
week or two ago about deleteByQueries knocking my replicas offline.

> On Mar 5, 2018, at 1:44 PM, Shalin Shekhar Mangar  
> wrote:
> 
> You can look inside the index.properties. The directory name mentioned in
> that properties file is the one being used actively. The rest are old
> directories that should be cleaned up on Solr restart but you can delete
> them yourself without any issues.
> 
> On Mon, Mar 5, 2018 at 11:43 PM, Tom Peters  wrote:
> 
>> While trying to debug an issue with CDCR, I noticed that the
>> /var/solr/data directories on my source cluster have wildly different sizes.
>> 
>>  % for i in solr2-{a..e}; do echo -n "$i: "; ssh -A $i du -sh
>> /var/solr/data; done
>>  solr2-a: 9.5G   /var/solr/data
>>  solr2-b: 29G/var/solr/data
>>  solr2-c: 6.6G   /var/solr/data
>>  solr2-d: 9.7G   /var/solr/data
>>  solr2-e: 19G/var/solr/data
>> 
>> The leader is currently "solr2-a"
>> 
>> Here's the actual index size:
>> 
>>  Master (Searching)
>>  1520273178244 # version
>>  73034 # gen
>>  3.66 GB   # size
>> 
>> When I look inside /var/solr/data/ on solr2-b, I see a bunch of index.*
>> directories:
>> 
>>  % ls | grep index
>>  index.20180223021742634
>>  index.20180223024901983
>>  index.20180223033852960
>>  index.20180223034811193
>>  index.20180223035648403
>>  index.20180223041040318
>>  index.properties
>> 
>> On solr2-a, I only see one index directory (index.20180222192820572).
>> 
>> Does anyone know why this will happen and how I can clean it up without
>> potentially causing any issues? We're currently on version Solr 7.1.
>> 
>> 
>> This message and any attachment may contain information that is
>> confidential and/or proprietary. Any use, disclosure, copying, storing, or
>> distribution of this e-mail or any attached file by anyone other than the
>> intended recipient is strictly prohibited. If you have received this
>> message in error, please notify the sender by reply email and delete the
>> message and any attachments. Thank you.
>> 
> 
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.



This message and any attachment may contain information that is confidential 
and/or proprietary. Any use, disclosure, copying, storing, or distribution of 
this e-mail or any attached file by anyone other than the intended recipient is 
strictly prohibited. If you have received this message in error, please notify 
the sender by reply email and delete the message and any attachments. Thank you.


Re: /var/solr/data has lots of index* directories

2018-03-05 Thread Shalin Shekhar Mangar
You can look inside the index.properties. The directory name mentioned in
that properties file is the one being used actively. The rest are old
directories that should be cleaned up on Solr restart but you can delete
them yourself without any issues.

On Mon, Mar 5, 2018 at 11:43 PM, Tom Peters  wrote:

> While trying to debug an issue with CDCR, I noticed that the
> /var/solr/data directories on my source cluster have wildly different sizes.
>
>   % for i in solr2-{a..e}; do echo -n "$i: "; ssh -A $i du -sh
> /var/solr/data; done
>   solr2-a: 9.5G   /var/solr/data
>   solr2-b: 29G/var/solr/data
>   solr2-c: 6.6G   /var/solr/data
>   solr2-d: 9.7G   /var/solr/data
>   solr2-e: 19G/var/solr/data
>
> The leader is currently "solr2-a"
>
> Here's the actual index size:
>
>   Master (Searching)
>   1520273178244 # version
>   73034 # gen
>   3.66 GB   # size
>
> When I look inside /var/solr/data/ on solr2-b, I see a bunch of index.*
> directories:
>
>   % ls | grep index
>   index.20180223021742634
>   index.20180223024901983
>   index.20180223033852960
>   index.20180223034811193
>   index.20180223035648403
>   index.20180223041040318
>   index.properties
>
> On solr2-a, I only see one index directory (index.20180222192820572).
>
> Does anyone know why this will happen and how I can clean it up without
> potentially causing any issues? We're currently on version Solr 7.1.
>
>
> This message and any attachment may contain information that is
> confidential and/or proprietary. Any use, disclosure, copying, storing, or
> distribution of this e-mail or any attached file by anyone other than the
> intended recipient is strictly prohibited. If you have received this
> message in error, please notify the sender by reply email and delete the
> message and any attachments. Thank you.
>



-- 
Regards,
Shalin Shekhar Mangar.


/var/solr/data has lots of index* directories

2018-03-05 Thread Tom Peters
While trying to debug an issue with CDCR, I noticed that the /var/solr/data 
directories on my source cluster have wildly different sizes.

  % for i in solr2-{a..e}; do echo -n "$i: "; ssh -A $i du -sh /var/solr/data; 
done
  solr2-a: 9.5G   /var/solr/data
  solr2-b: 29G/var/solr/data
  solr2-c: 6.6G   /var/solr/data
  solr2-d: 9.7G   /var/solr/data
  solr2-e: 19G/var/solr/data

The leader is currently "solr2-a"

Here's the actual index size:

  Master (Searching)
  1520273178244 # version
  73034 # gen
  3.66 GB   # size

When I look inside /var/solr/data/ on solr2-b, I see a bunch of index.* 
directories:

  % ls | grep index
  index.20180223021742634
  index.20180223024901983
  index.20180223033852960
  index.20180223034811193
  index.20180223035648403
  index.20180223041040318
  index.properties

On solr2-a, I only see one index directory (index.20180222192820572).

Does anyone know why this will happen and how I can clean it up without 
potentially causing any issues? We're currently on version Solr 7.1.


This message and any attachment may contain information that is confidential 
and/or proprietary. Any use, disclosure, copying, storing, or distribution of 
this e-mail or any attached file by anyone other than the intended recipient is 
strictly prohibited. If you have received this message in error, please notify 
the sender by reply email and delete the message and any attachments. Thank you.


Re: Expected mime type application/octet-stream but got text/html

2018-03-05 Thread Jeff Dyke
I'm not sure where you're documents are coming from but i would find this
from a 403/404 in an S3 bucket if the permissions were not correct.

But ultimately Walters last sentence is the best next step.

On Mon, Mar 5, 2018 at 12:38 PM, Walter Underwood 
wrote:

> I presume this error is from SolrJ.
>
> SolrJ has requested responses in javabin format, so it uses that parser.
> When there is an error, often a 503 (timeout), the body of the HTTP
> response is in HTML. When that happens, this error results.
>
> This was fixed in SolrJ sometime in the 5.x releases.
>
> To figure out the error, look at the Solr logs for a non-200 response code.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Mar 5, 2018, at 9:05 AM, mustafiz  wrote:
> >
> > Hi Dear,
> > I am facing exactly same issue for my dspace 6.1 but can't solving after
> 3
> > days working. as i am new on dspace. can you please help me detail how i
> can
> > get rid from it? details please.
> >
> > Thanks
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
>


Re: Expected mime type application/octet-stream but got text/html

2018-03-05 Thread Walter Underwood
I presume this error is from SolrJ.

SolrJ has requested responses in javabin format, so it uses that parser. When 
there is an error, often a 503 (timeout), the body of the HTTP response is in 
HTML. When that happens, this error results.

This was fixed in SolrJ sometime in the 5.x releases.

To figure out the error, look at the Solr logs for a non-200 response code.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 5, 2018, at 9:05 AM, mustafiz  wrote:
> 
> Hi Dear,
> I am facing exactly same issue for my dspace 6.1 but can't solving after 3
> days working. as i am new on dspace. can you please help me detail how i can
> get rid from it? details please.
> 
> Thanks
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Expected mime type application/octet-stream but got text/html

2018-03-05 Thread mustafiz
Hi Dear,
I am facing exactly same issue for my dspace 6.1 but can't solving after 3
days working. as i am new on dspace. can you please help me detail how i can
get rid from it? details please.

Thanks



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Alias field names when searching (not for results)

2018-03-05 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

All,

I'd like for users to be able to search a field by multiple names
without performing a "copy-field" when analyzing a document. Is that
possible? Whenever I search for "solr alias field" I get results about
how to re-name fields in the results.

Here's what I'd like to do. Let's say I have a document:

{
  id: 1234,
  field_1: valueA,
  field_2: valueB,
  field_3: valueC
}

I'd like users to be able to find this document using any of the
following queries:

   field_1:valueA
   f1:valueA
   1:valueA

I just want the query parser to say "oh, 'f1' is an alias for
'field_1'" and substitute that when performing the search. Is that
possible?

- -chris

-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqddZMdHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFgbFg/9HIgJgX4Lib2X4XYU
P2F4uW9TyDWtp6mA9xsfdYxRNe4K3yFPbkUUwJW2MI2V62SR6apB+TghOMqbmCD/
gaQ0CFWgLsn5Egulj2taUN+MAYD/4GMO9ltyXNc2g9siSMIDUS5N09fwJbxfBXrP
SPvSQqUOVD5wKCgoCXCVd+RM+SEClX4k1ZuWDbVAiO4YPpJwFy6+BN2uTCaqP3Ll
XOqn+/6ejnPCcvoQrTlE1/DiBTUti8H7V0LOjzEZns8YqZOAH+pAVxYRRQM5UzZS
pUBGpHokoaZ0tMf/aCmHp5pI5VWrxrXcS47csBRvoAn8Z7uRxH8p0wYE8BkGs2rw
dEzOSOKdhma11ZDkWKg2/sBw8v9swyWy9W3MuA0tqYzfZicsXT2GBHzyPDsqabDq
mBPWuxUdqZEaz+fE8SRsW84ELcqe1fbltscng/ZhNRkLOtmn6aeMc+XABhpcVE7o
Rfodl/PrQetgzZ4WLyzb7m2bz2w38x6WSPhuQIZHVrHNoCXG+gWY3zMxF6EBEFCV
CJvsXaQ1ZpGLjO/uCXJ9iHKxsSoUzWap9qws82xH3QJ52Q7vCoxF5G/2MZWvvgje
+MsZbh8L5D0HBM1jTKWx3X+r3FbdURu6P8yUFD/Hywy2J/jev1MiU4Zh3Yw+JByo
mR8TdvleHAHfA01tArVgk2yscqI=
=44DX
-END PGP SIGNATURE-


Re: Updating documents and commit/rollback

2018-03-05 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 3/2/18 7:46 PM, Shawn Heisey wrote:
> On 3/2/2018 10:39 AM, Christopher Schultz wrote:
>> The problem is that I'm updating the index after my SQL UPDATE(s)
>> have run, but before my SQL COMMIT occurs. I have had a problem
>> where the SQL fails and rolls-back, but the solrClient is not
>> rolled-back.
>> 
>> I'm a little wary of rolling-back Solr because, as I understand
>> it, the client itself doesn't carry any transactional
>> information. That is, it should be a shared-resource (within the
>> web application) and indeed, other clients could be connecting
>> from other places (like other app servers running the same
>> application). Performing either commit() or rollback() on the
>> Solr client will commit/rollback *all* writes since the last
>> commit, right?
> 
> Correct.  Relational databases typically keep track of transactions
> on one connection separately from transactions on another
> connection, and can roll one of them back without affecting the
> others.
> 
> Solr doesn't have this capability.  The reason that it doesn't have
> this capability is that Lucene doesn't have it, and the majority of
> Solr functionality is provided by Lucene.
> 
> If updates are happening concurrently from multiple sources, then 
> there's no way to have any kind of meaningful rollback.
> 
> I see two solutions:
> 
> 1) Funnel all updates through a single thread/process, which will
> not move on from one update to another until the final decision is
> made about that update.  Then rolling back becomes possible,
> because there is only one source for updates.  The disadvantage
> here is that this thread/process becomes a bottleneck, and
> performance may suffer greatly.  Also, it can be a single point of
> failure.  If the rate of updates is low, then the bottleneck may
> not be a problem.
> 
> 2) Have your updating software revert the changes "manually" in 
> situations where the SQL change is rolled back ... by either
> deleting the record or sending another update to change values back
> to what they were before.

Yeah, technique #2 was the only thing I could come up with that made
any sense. Serializing updates is probably more trouble than it's worth.

In an environment where I'd probably expect to have maybe 50 - 100
"writes" daily to a Solr core, how do you recommend commits be done?
The documents are quite small (user metadata like username, first/last
and email). Can I add/commit simultaneously? There seems to be no
reason to perform separate add/commit steps in this scenario.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqddMUdHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFjQHBAAiZaJLBQM6t6OLYea
LsGtqCtDTCmUuJGpBq7q8/+26OkgCTK0KDOGWlqpMeMvCe8uLlN0qDTGHEm0nLCk
Ils9Yv+UOP8iiYMvodUxv5d5Y75Yt5aQ0yZ8X7vp1KOCXTZhXIjmAdtw8KaC3z4y
zYJcI3DAEYurkmJcFVwZNQ7LRck2RWRNNsRfWaZ0yGAd2AUvvCp2zV3e0i5cs7hA
xICklU+5+5Nsy90pyDalnpgwrbc0uE6ZFGSkAocSDBdvNNONbNAq+sUYsov8af0+
6qhQWOqZOT2M+Ue51Nlqy+PtECzWOsqXcpFNyM/2Rsz1cnKCzAUbDs2Hi7m5R1UX
tST10VBvFTJ4GukGVPxHysVxwTHVg1HYCEngfHKS7HqiVtwkqWMzm315toWoDRfQ
J8EMeFZ/cQx716D+DPAKudGBWZ3akyODsb9h1KB4i85pGT4rijKhY7bxddhFDnHi
gbCdnpU9/pv8G/Y2SUhW4SgEUd3X6YZZD/4cZ4ocrf8KaXBFrLe8iz1aoFYI5ldh
i3TAi28dFHqxrofBTo4f42AXm9SYsycCQ2kBj7Yegyt5Sljfr3yoOckoJnNR05mX
2qjBIJJjJT0CvnV18azerdhpkZtcVbdVYC4WZHEjf6doC3SqqLHL6Pfu5Ha4APZ8
hc0tRk3wV+Cn/XVVx691QN0X1Nw=
=0s2n
-END PGP SIGNATURE-


Re: SolrCloud 7.2.1 - UnsupportedOperationException thrown after query on specific environments

2018-03-05 Thread Andy Jolly
We were able to locate the exact issue after some more digging.  We added a
query to another collection that runs alongside the job we were executing
and we were missing the collection reference in the URL.  If the below query
is run by itself in at least Solr 7, the error will be reproduced.
http://localhost:8983/solr//select?q=*:*

Since the collection was left empty, collectionsList in HttpSolrCall.java
was being set to an immutable Collections.emptyList() by the
resolveCollectionListOrAlias method.  Then, when
collectionsList.add(collectionName) was called in the getRemotCoreUrl method
the error we are seeing is thrown as it is trying to add to an immutable
list.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Accessing a specific solr collection from custom searchHandler

2018-03-05 Thread Gintautas Sulskus
Thanks for the reply, Emir.

I would like to use server-side Java API to extend the RequestHandlerBase
and invoke other requestHandlers and queries. An example of such
implementation could be found here:
http://sujitpal.blogspot.co.uk/2011/02/solr-custom-search-requesthandler.html
However, this example does not point to a more detailed API
documentation/dev manual and does not show how I could invoke cross
collection queries.

Regards,
Gintautas

On Mon, Mar 5, 2018 at 1:46 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Gintautas,
> I suggest that you start from some existing SearchHandler that does
> something similar to what you are after or some simple handler. Also check
> if handler or SearchComponent is better base for your feature.
> Here is a blog post describing how to write, build and deploy custom Solr
> extension (ignore part about the problem that update request processor
> solves): http://www.od-bits.com/2018/02/solr-docvalues-on-analysed-
> field.html  field.html>
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 5 Mar 2018, at 13:07, Gintautas Sulskus 
> wrote:
> >
> > I would like to write a searchHandler for complex cross-collection
> queries.
> >
> > On Mon, Mar 5, 2018 at 12:05 PM, Gintautas Sulskus <
> > gintautas.suls...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> How do I access a different collection from a custom searchHandler?
> >> Is there any documentation on custom component (e.g. searchHandler)
> >> development?
> >>
> >> Regards,
> >> Gintas
> >>
>
>


Re: Solr CDCR doesn't work if the authentication is enabled

2018-03-05 Thread Amrit Sarkar
Nice. Can you please post the details on the JIRA too if possible:
https://issues.apache.org/jira/browse/SOLR-11959 and we can probably put up
a small patch of adding this bit of information in official documentation.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Mon, Mar 5, 2018 at 8:11 PM, dimaf  wrote:

> To resolve the issue, I added names of Source node to /live_nodes of
> Target.
> https://stackoverflow.com/questions/48790621/solr-cdcr-doesnt-work-if-the-
> authentication-is-enabled
>  doesnt-work-if-the-authentication-is-enabled>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Solr CDCR doesn't work if the authentication is enabled

2018-03-05 Thread dimaf
To resolve the issue, I added names of Source node to /live_nodes of Target.
https://stackoverflow.com/questions/48790621/solr-cdcr-doesnt-work-if-the-authentication-is-enabled

  



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: solr url control

2018-03-05 Thread Becky Bonner
Thank you for your response.  Our dev instance is not a cloud but we will be 
implementing cloud in our staging and production environments.  I was afraid 
you were going to tell me that the substructure was not supported. I was hoping 
that in the core autodiscovery, it would keep the path.  Thanks for your help. 

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Friday, March 2, 2018 6:45 PM
To: solr-user@lucene.apache.org
Subject: Re: solr url control

On 3/2/2018 10:29 AM, Becky Bonner wrote:
> We are trying to setup one solr server for several applications each with a 
> different collection.  Is there a way to have have 2 collections under one 
> folder and the url be something like this:
> http://mysolrinstance.com/solr/myParent1/collection1
> http://mysolrinstance.com/solr/myParent1/collection2
> http://mysolrinstance.com/solr/myParent2
> http://mysolrinstance.com/solr/myParent3

No. I am not aware of any way to set up a hierarchy like this. Collections and 
cores have one identifier for their names.  You could use myparent1_collection1 
as a name.

Implementing such a hierarchy like this would likely be difficult for the dev 
team, and would probably be a large source of bugs for several releases after 
it first became available.  I don't think a feature like this is likely to 
happen.

Later, you said "We would not want the data from one collection to ever show up 
in another collection query."  That's not ever going to happen unless the 
software making the query explicitly requests it, and it will need to know 
details about the indexes in your Solr server to be able to do it successfully. 
 FYI: People who cannot be trusted shouldn't ever have direct access to your 
Solr installation.

Are you running SolrCloud?  I ask because if you're not, then the terminology 
for each index isn't a "collection" ... it's a core.  This is a pedantic 
statement, but you'll get better answers if your terminology is correct.

If you were running SolrCloud, it would be extremely unlikely for you to have a 
directory structure like you describe.  SolrCloud normally handles all core 
creation behind the scenes and isn't going to set up a directory structure like 
that.

Information about how core discovery works:

https://wiki.apache.org/solr/Core%20Discovery%20%284.4%20and%20beyond%29#Finding_cores

Thanks,
Shawn



Solr Cloud: query elevation + deduplication?

2018-03-05 Thread Ronja Koistinen
Hello,

I am running Solr Cloud 6.6.2 and trying to get query elevation and
deduplication (with SignatureUpdateProcessor) working at the same time.

The documentation for deduplication
(https://lucene.apache.org/solr/guide/6_6/de-duplication.html) does not
specify if the signatureField needs to be the uniqueKey field configured
in my schema.xml. Currently I have my uniqueKey set to the field
containing the url of my documents.

The query elevation seems to reference documents by the uniqueKey in the
"id" attributes listed in elevate.xml, so having the uniqueKey be the
url would be beneficial to my process of maintaining the query elevation
list.

Also, what is the status of this issue I found?
https://issues.apache.org/jira/browse/SOLR-3473

-- 
Ronja Koistinen
University of Helsinki



signature.asc
Description: OpenPGP digital signature


Re: Accessing a specific solr collection from custom searchHandler

2018-03-05 Thread Emir Arnautović
Hi Gintautas,
I suggest that you start from some existing SearchHandler that does something 
similar to what you are after or some simple handler. Also check if handler or 
SearchComponent is better base for your feature.
Here is a blog post describing how to write, build and deploy custom Solr 
extension (ignore part about the problem that update request processor solves): 
http://www.od-bits.com/2018/02/solr-docvalues-on-analysed-field.html 


HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 5 Mar 2018, at 13:07, Gintautas Sulskus  
> wrote:
> 
> I would like to write a searchHandler for complex cross-collection queries.
> 
> On Mon, Mar 5, 2018 at 12:05 PM, Gintautas Sulskus <
> gintautas.suls...@gmail.com> wrote:
> 
>> Hi,
>> 
>> How do I access a different collection from a custom searchHandler?
>> Is there any documentation on custom component (e.g. searchHandler)
>> development?
>> 
>> Regards,
>> Gintas
>> 



Re: Accessing a specific solr collection from custom searchHandler

2018-03-05 Thread Gintautas Sulskus
I would like to write a searchHandler for complex cross-collection queries.

On Mon, Mar 5, 2018 at 12:05 PM, Gintautas Sulskus <
gintautas.suls...@gmail.com> wrote:

> Hi,
>
> How do I access a different collection from a custom searchHandler?
> Is there any documentation on custom component (e.g. searchHandler)
> development?
>
> Regards,
> Gintas
>


Accessing a specific solr collection from custom searchHandler

2018-03-05 Thread Gintautas Sulskus
Hi,

How do I access a different collection from a custom searchHandler?
Is there any documentation on custom component (e.g. searchHandler)
development?

Regards,
Gintas


Solr SynonymGraphFilterFactory error on import

2018-03-05 Thread damian.pawski
After upgrading to Solr 7.2 import started to log errors for some documents.

Field that returns errors:

   
  



 






  
  








  

During the import below error is returned for some of the records:

org.apache.solr.common.SolrException: Exception writing document id X to
the index; possible analysis error: startOffset must be non-negative, and
endOffset must be >= startOffset, and offsets must not go backwards
startOffset=2874,endOffset=2878,lastStartOffset=2879 for field 'X'
at
g.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:226)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:936)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:616)
at
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:80)


It is related to the:



If I remove this it works fine, previously we were using:


and it was working fine, but the SynonymFilterFactory is not longer
supported on the Solr 7.X., it has been replaced with
SynonymGraphFilterFactory, I have added FlattenGraphFilterFactory as
suggested.

I am not sure why Solr returns those errors?

Thank you in advance for suggestions.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html