Re: CustomScoreProvider Sucks, Need Help

2018-02-23 Thread ~$alpha`
But why when i am using customize sort, with just return it should work fast



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: autosuggestion indexing in a solr cluster

2018-02-23 Thread Deepak Udapudi

Using the solr collection/suggest?suggesttrue URL for 
populating the index.

Regards,
Deepak


-Original Message-
From: Deepak Udapudi [mailto:dudap...@delta.org]
Sent: Friday, February 23, 2018 5:42 PM
To: solr-user@lucene.apache.org
Cc: Balakrishna Sudabathula ; Anupama Pullela 
; Segar Soundiramourthy 
Subject: autosuggestion indexing in a solr cluster

Hi all,

We are using a Solr cluster.
We have Solr configuration for auto-suggestions as shown below.



Specialty
specialty
specialty
AnalyzingInfixLookupFactory
specialty_suggester_infix_dir
DocumentDictionaryFactory
specialty
text_general
true
true
2
specialty_provider_suggestor_dictionary
string





true
5
Specialty


suggest




We are using the URL 
"http://aw-lx0092:8984/solr/provider_collection_deepak/suggest?suggesttrue;
 to build the index.
We see that the index that is built for auto-suggestions is not propagated to 
other solr instances in the cluster.
Need auto suggestions indices to propagate to all the solr instances 
automatically.
So, request for suggestions/solutions for the same.

Thanks,
Deepak




The information contained in this email message and any attachments is 
confidential and intended only for the addressee(s). If you are not an 
addressee, you may not copy or disclose the information, or act upon it, and 
you should delete it entirely from your email system. Please notify the sender 
that you received this email in error.


The information contained in this email message and any attachments is 
confidential and intended only for the addressee(s). If you are not an 
addressee, you may not copy or disclose the information, or act upon it, and 
you should delete it entirely from your email system. Please notify the sender 
that you received this email in error.


autosuggestion indexing in a solr cluster

2018-02-23 Thread Deepak Udapudi
Hi all,

We are using a Solr cluster.
We have Solr configuration for auto-suggestions as shown below.



Specialty
specialty
specialty
AnalyzingInfixLookupFactory
specialty_suggester_infix_dir
DocumentDictionaryFactory
specialty
text_general
true
true
2
specialty_provider_suggestor_dictionary
string





true
5
Specialty


suggest




We are using the URL 
"http://aw-lx0092:8984/solr/provider_collection_deepak/suggest?suggest=true=true;
 to build the index.
We see that the index that is built for auto-suggestions is not propagated to 
other solr instances in the cluster.
Need auto suggestions indices to propagate to all the solr instances 
automatically.
So, request for suggestions/solutions for the same.

Thanks,
Deepak




The information contained in this email message and any attachments is 
confidential and intended only for the addressee(s). If you are not an 
addressee, you may not copy or disclose the information, or act upon it, and 
you should delete it entirely from your email system. Please notify the sender 
that you received this email in error.


Re: custom unique numeric id

2018-02-23 Thread Clay McDonald
Thank you, Clay

> On Feb 23, 2018, at 6:29 PM, Shawn Heisey  wrote:
> 
>> On 2/23/2018 2:57 PM, Clay McDonald wrote:
>> I'm new to Solr/Lucene and I'd like to know if there is a way to auto-create 
>> a unique numeric id in a custom field that we can them reference when making 
>> calls to the index from Python. It seems to use that using a numeric id 
>> would speed up our calls to and fro Solr from our PySpark ML application.
> 
> This should do it:
> 
> https://wiki.apache.org/solr/UniqueKey#UUID_techniques
> 
> Thanks,
> Shawn
> 


Re: custom unique numeric id

2018-02-23 Thread Shawn Heisey
On 2/23/2018 2:57 PM, Clay McDonald wrote:
> I'm new to Solr/Lucene and I'd like to know if there is a way to auto-create 
> a unique numeric id in a custom field that we can them reference when making 
> calls to the index from Python. It seems to use that using a numeric id would 
> speed up our calls to and fro Solr from our PySpark ML application.

This should do it:

https://wiki.apache.org/solr/UniqueKey#UUID_techniques

Thanks,
Shawn



Re: statistics in hitlist

2018-02-23 Thread Joel Bernstein
This is going to be a complex answer because Solr actually now has multiple
ways of doing regression analysis as part of the Streaming Expression
statistical programming library. The basic documentation is here:

https://lucene.apache.org/solr/guide/7_2/statistical-programming.html

Here is a sample expression that performs a simple linear regression in
Solr 7.2:

let(a=random(collection1, q="any query", rows="15000", fl="fieldA, fieldB"),
b=col(a, fieldA),
c=col(a, fieldB),
d=regress(b, c))


The expression above takes a random sample of 15000 results from
collection1. The result set will include fieldA and fieldB in each record.
The result set is stored in variable "a".

Then the "col" function creates arrays of numbers from the results stored
in variable a. The values in fieldA are stored in the variable "b". The
values in fieldB are stored in variable "c".

Then the regress function performs a simple linear regression on arrays
stored in variables "b" and "c".

The output of the regress function is a map containing the regression
result. This result includes RSquared and other attributes of the
regression model such as R (correlation), slope, y intercept etc...









Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Feb 23, 2018 at 3:10 PM, John Smith  wrote:

> Hi Joel, thanks for the answer. I'm not really a stats guy, but the end
> result of all this is supposed to be obtaining R^2. Is there no way of
> obtaining this value, then (short of iterating over all the results in the
> hitlist and calculating it myself)?
>
> On Fri, Feb 23, 2018 at 12:26 PM, Joel Bernstein 
> wrote:
>
> > Typically SSE is the sum of the squared errors of the prediction in a
> > regression analysis. The stats component doesn't perform regression,
> > although it might be a nice feature.
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Fri, Feb 23, 2018 at 12:17 PM, John Smith 
> wrote:
> >
> > > I'm using solr, and enabling stats as per this page:
> > > https://lucene.apache.org/solr/guide/6_6/the-stats-component.html
> > >
> > > I want to get more stat values though. Specifically I'm looking for
> > > r-squared (coefficient of determination). This value is not present in
> > > solr, however some of the pieces used to calculate r^2 are in the stats
> > > element, for example:
> > >
> > > 0.0
> > > 10.0
> > > 15
> > > 17
> > > 85.0
> > > 603.0
> > > 5.667
> > > 2.943920288775949
> > >
> > >
> > > So I have the sumOfSquares available (SST), and using this
> calculation, I
> > > can get R^2:
> > >
> > > R^2 = 1 - SSE/SST
> > >
> > > All I need then is SSE. Is there anyway I can get SSE from those other
> > > stats in solr?
> > >
> > > Thanks in advance!
> > >
> >
>


Rename solrconfig.xml

2018-02-23 Thread Zheng Lin Edwin Yeo
Hi,

Would like to check, how can we rename solrconfig.xml to something else?
For example, I want to rename it to myconfig.xml. Is this possible?

I'm using Solr 6.5.1, and planning to upgrade to Solr 7.2.1.

Regards,
Edwin


custom unique numeric id

2018-02-23 Thread Clay McDonald
Hello,

I'm new to Solr/Lucene and I'd like to know if there is a way to auto-create a 
unique numeric id in a custom field that we can them reference when making 
calls to the index from Python. It seems to use that using a numeric id would 
speed up our calls to and fro Solr from our PySpark ML application.

Thoughts?

Thanks,

Clay



Re: Indexing timeout issues with SolrCloud 7.1

2018-02-23 Thread Tom Peters
I included the last 25 lines from the logs from each of the five nodes during 
that time period.

I _think_ I'm running into issues with bulking up deleteByQuery. Quick 
background: we have objects in our system that may have multiple availability 
windows. So when we index an object, will store it as separate documents each 
with their own begins and expires date. At index time we don't know if the all 
of the windows are still valid or not, so we remove all of them with a 
deleteByQuery (e.g. deleteByQuery=object_id:12345) and then index one or more 
documents.

I ran an isolated test a number of times where I indexed 1500 documents in this 
manner (deletes then index). In Solr 3.4, it takes about 15s to complete. In 
Solr 7.1, it's taking about 5m. If I remove the deleteByQuery, the indexing 
times are nearly identical.

When run in normal production mode where we have lots of processes indexing at 
once (~20 or so), it starts to cause lots of issues (which you see below).


Please let me know if anything I mentioned is unclear. Thanks!




solr2-a:
2018-02-23 04:09:36.551 ERROR 
(updateExecutor-2-thread-2672-processing-http:solr2-b:8080//solr//mycollection_shard1_replica_n1
 x:mycollection_shard1_replica_n6 r:core_node9 
n:solr2-a.vam.be.cmh.mycollection.com:8080_solr s:shard1 c:mycollection) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.u.ErrorReportingConcurrentUpdateSolrClient error
2018-02-23 04:09:36.551 ERROR 
(updateExecutor-2-thread-2692-processing-http:solr2-d:8080//solr//mycollection_shard1_replica_n11
 x:mycollection_shard1_replica_n6 r:core_node9 
n:solr2-a.vam.be.cmh.mycollection.com:8080_solr s:shard1 c:mycollection) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.u.ErrorReportingConcurrentUpdateSolrClient error
2018-02-23 04:09:36.551 ERROR 
(updateExecutor-2-thread-2711-processing-http:solr2-e:8080//solr//mycollection_shard1_replica_n4
 x:mycollection_shard1_replica_n6 r:core_node9 
n:solr2-a.vam.be.cmh.mycollection.com:8080_solr s:shard1 c:mycollection) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.u.ErrorReportingConcurrentUpdateSolrClient error
2018-02-23 04:09:36.552 ERROR (qtp1595212853-32739) [c:mycollection s:shard1 
r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.u.p.DistributedUpdateProcessor Setting up to try to start recovery on 
replica http://solr2-b:8080/solr/mycollection_shard1_replica_n1/
2018-02-23 04:09:36.552 ERROR (qtp1595212853-32739) [c:mycollection s:shard1 
r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.u.p.DistributedUpdateProcessor Setting up to try to start recovery on 
replica http://solr2-d:8080/solr/mycollection_shard1_replica_n11/
2018-02-23 04:09:36.552 ERROR (qtp1595212853-32739) [c:mycollection s:shard1 
r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.u.p.DistributedUpdateProcessor Setting up to try to start recovery on 
replica http://solr2-e:8080/solr/mycollection_shard1_replica_n4/
2018-02-23 04:09:38.217 ERROR 
(updateExecutor-2-thread-2712-processing-http:solr2-e:8080//solr//mycollection_shard1_replica_n4
 x:mycollection_shard1_replica_n6 r:core_node9 
n:solr2-a.vam.be.cmh.mycollection.com:8080_solr s:shard1 c:mycollection) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.u.ErrorReportingConcurrentUpdateSolrClient error
2018-02-23 04:09:38.217 ERROR 
(updateExecutor-2-thread-2726-processing-http:solr2-d:8080//solr//mycollection_shard1_replica_n11
 x:mycollection_shard1_replica_n6 r:core_node9 
n:solr2-a.vam.be.cmh.mycollection.com:8080_solr s:shard1 c:mycollection) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.u.ErrorReportingConcurrentUpdateSolrClient error
2018-02-23 04:09:38.217 ERROR 
(updateExecutor-2-thread-2727-processing-http:solr2-b:8080//solr//mycollection_shard1_replica_n1
 x:mycollection_shard1_replica_n6 r:core_node9 
n:solr2-a.vam.be.cmh.mycollection.com:8080_solr s:shard1 c:mycollection) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.u.ErrorReportingConcurrentUpdateSolrClient error
2018-02-23 04:09:38.218 ERROR (qtp1595212853-32260) [c:mycollection s:shard1 
r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.u.p.DistributedUpdateProcessor Setting up to try to start recovery on 
replica http://solr2-b:8080/solr/mycollection_shard1_replica_n1/
2018-02-23 04:09:38.218 ERROR (qtp1595212853-32260) [c:mycollection s:shard1 
r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.u.p.DistributedUpdateProcessor Setting up to try to start recovery on 
replica http://solr2-d:8080/solr/mycollection_shard1_replica_n11/
2018-02-23 04:09:38.218 ERROR (qtp1595212853-32260) [c:mycollection s:shard1 
r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.u.p.DistributedUpdateProcessor Setting up to try to start recovery on 
replica http://solr2-e:8080/solr/mycollection_shard1_replica_n4/
2018-02-23 04:09:50.048 

RE: SOLR Score Range Changed

2018-02-23 Thread Hodder, Rick
Classic Similarity helped, but the ranges of values don’t have a min near 0 
like back in 4's version



Are there other attributes/elements to this factory that could get me back the 
old functionality?

-Original Message-
From: Joël Trigalo [mailto:jtrig...@gmail.com] 
Sent: Friday, February 23, 2018 10:41 AM
To: solr-user@lucene.apache.org
Subject: Re: SOLR Score Range Changed

The difference seems due to the fact that default similarity in solr 7 is
BM25 while it used to be TF-IDF in solr 4. As you realised, BM25 function is 
smoother.
You can configure schema.xml to use ClassicSimilarity, for instance 
https://lucene.apache.org/solr/guide/6_6/major-changes-from-solr-5-to-solr-6.html#default-similarity-changes
https://lucene.apache.org/solr/guide/6_6/field-type-definitions-and-properties.html#FieldTypeDefinitionsandProperties-FieldTypeSimilarity

But as said before, maybe you are using properties that are not guaranteed so 
it would be better to change score function or sorting (rather than coming back 
to ClassicSimilarity)



RE: SOLR Score Range Changed

2018-02-23 Thread Hodder, Rick
Hi Shawn,

Thanks for your help - I'm still finding my way in the weeds of SOLR.

Combining everything into one query is what I'd prefer because as you said, one 
would think that with everything in the same query, the score would organize 
everything nicely.

>>Assuming you're using the default relevancy sort
Yes

>> does the order of your search results change dramatically from one version 
>> to the other?  If it does, is the order generally better from a relevance 
>> standpoint, or generally worse?  If you are specifying an explicit sort, 
>> then the scores will likely be ignored.

Here's what we do - we have a list of policies with names (among other things, 
but I'll just use names for an example.

We search for several business names to see if we have policies in common with 
the names so that we don’t have too much risk with them.

So let's say I'm doing a search against three business names

Bob's carpentry
Conslidated carpentry of the Greater North West
Carpentry Land

q=(IDX_CompanyName:bob's AND carpentry) OR (IDX_CompanyName: conslidated AND 
carpentry AND of AND the AND Greater AND North AND West) OR (IDX_CompanyName: 
Carpentry AND Land)

Searching for 750 rows has hits that are all focused on Consolidated (seemingly 
because the number of words causes the SOLR score to go up into a higher range 
for all Consolidated results, as mentioned in my previous email.) Searching for 
all 3 things at the same time doesn’t insure that all 3 companies will be in 
the results, even when run separately there are results for all 3. If I boost 
maxrows to 4000, I see a few bob's carpentry but most are still Consolidated

So the way we had addressed it was running 3 separate SOLR queries and 
combining them and sorting them by descending score - wasn’t perfect, but it 
worked, and helped me to reduce the number of results we hand off to a scoring 
engine that applies 3 algorithms (Monge-Elkan, Jaro-Winkler, and SmithWindowed 
Affline) to further hone the results - which can take LOTS of time if there are 
a lot of results, so 


What I am describing is also why it's strongly recommended that you never try 
to convert scores to percentages:

https://wiki.apache.org/lucene-java/ScoresAsPercentages

Thanks,
Shawn



Re: Indexing timeout issues with SolrCloud 7.1

2018-02-23 Thread Deepak Goel
Can you please post all the errors? The current error is only for the node
'solr-2d'

On 23 Feb 2018 09:42, "Tom Peters"  wrote:

I'm trying to debug why indexing in SolrCloud 7.1 is having so many issues.
It will hang most of the time, and timeout the rest.

Here's an example:

time curl -s 'myhost:8080/solr/mycollection/update/json/docs' -d
'{"solr_id":"test_001", "data_type":"test"}'|jq .
{
  "responseHeader": {
"status": 0,
"QTime": 5004
  }
}
curl -s 'myhost:8080/solr/mycollection/update/json/docs' -d   0.00s
user 0.00s system 0% cpu 5.025 total
jq .  0.01s user 0.00s system 0% cpu 5.025 total

Here's some of the timeout errors I'm seeing:

2018-02-23 03:55:02.903 ERROR (qtp1595212853-3607) [c:mycollection
s:shard1 r:core_node12 x:mycollection_shard1_replica_n11]
o.a.s.h.RequestHandlerBase java.io.IOException:
java.util.concurrent.TimeoutException:
Idle timeout expired: 12/12 ms
2018-02-23 03:55:02.903 ERROR (qtp1595212853-3607) [c:mycollection
s:shard1 r:core_node12 x:mycollection_shard1_replica_n11]
o.a.s.s.HttpSolrCall null:java.io.IOException:
java.util.concurrent.TimeoutException:
Idle timeout expired: 12/12 ms
2018-02-23 03:55:36.517 ERROR (recoveryExecutor-3-thread-4-
processing-n:solr2-d.myhost:8080_solr x:mycollection_shard1_replica_n11
s:shard1 c:mycollection r:core_node12) [c:mycollection s:shard1
r:core_node12 x:mycollection_shard1_replica_n11] o.a.s.h.ReplicationHandler
Index fetch failed :org.apache.solr.common.SolrException: Index fetch
failed :
2018-02-23 03:55:36.517 ERROR (recoveryExecutor-3-thread-4-
processing-n:solr2-d.myhost:8080_solr x:mycollection_shard1_replica_n11
s:shard1 c:mycollection r:core_node12) [c:mycollection s:shard1
r:core_node12 x:mycollection_shard1_replica_n11] o.a.s.c.RecoveryStrategy
Error while trying to recover:org.apache.solr.common.SolrException:
Replication for recovery failed.


We currently have two separate Solr clusters. Our current in-production
cluster which runs on Solr 3.4 and a new ring that I'm trying to bring up
which runs on SolrCloud 7.1. I have the exact same code that is indexing to
both clusters. The Solr 3.4 indexes fine, but I'm running into lots of
issues with SolrCloud 7.1.


Some additional details about the setup:

* 5 nodes solr2-a through solr2-e.
* 5 replicas
* 1 shard
* The servers have 48G of RAM with -Xmx and -Xms set to 16G
* I currently have soft commits at 10m intervals and hard commits (with
openSearcher=false) at 1m intervals. I also tried 5m (soft) and 15s (hard)
as well.

Any help or pointers would be greatly appreciated. Thanks!


This message and any attachment may contain information that is
confidential and/or proprietary. Any use, disclosure, copying, storing, or
distribution of this e-mail or any attached file by anyone other than the
intended recipient is strictly prohibited. If you have received this
message in error, please notify the sender by reply email and delete the
message and any attachments. Thank you.


Re: configure jetty to use both http1.1 and H2

2018-02-23 Thread Jeff Dyke
Answering a bit of my own question, the underlying jetty would have to be
built with it, and get pushed into its jar directory.

I think i'll put nginx in front of this, do a quick proxy forcing 1.1 and
move on, but if anyone knows any tricks, it'll be good just for
thoroughness of this thread and my curiosity.

Best,
Jeff

On Fri, Feb 23, 2018 at 3:11 PM, Jeff Dyke  wrote:

> Thanks for the tip Jason.  I didn't see the -j option there or here
> https://lucene.apache.org/solr/guide/7_2/solr-
> control-script-reference.html
>
> I'll keep this short, i tried to add it to the init.d script and then
> interacting directly with the solr binary, but ultimately saw that
> logs/solr-console-8983.log was updated with an exception saying dependency
> not met.
>
> It looks like only /opt/solr/server/lib/jetty-http-9.3.20.v20170531.jar
> is bundled.
>
> So i guess at this point i have one question that i don't think i'd go
> through with b/c i'd like to keep the application install clean, But for
> curiosity will solr pick up any jar in the lib directory and then i could
> pass -j --module=http2, perhaps or define if where module=http is defined,
> but that may just be passed at start i'd assume.
>
> Otherwise i may just put nginx in front of the master admin, which has A
> LOT of other security around it, b/c i am trying to access it from outside
> my VPC.
>
> Thanks again!
> Jeff
>
>
>
> On Fri, Feb 23, 2018 at 1:53 PM, Jason Gerlowski 
> wrote:
>
>> Hi Jeff,
>>
>> I haven't tested your exact use case regarding H/2, but the "bin/solr"
>> startup script has a special "-j" options that can be used to pass
>> arbitrary flags to the underlying Jetty server.  If you have options
>> that work with vanilla Jetty, they _should_ work when passed through
>> the "bin/solr" interface as well.  Check out "bin/solr start -help"
>> for more info.
>>
>> If it doesn't work out, please let us know, and post the commands you
>> tried, output, etc.
>>
>> Best,
>>
>> Jason
>>
>> On Fri, Feb 23, 2018 at 11:56 AM, Jeff Dyke  wrote:
>> > Hi, I've been googling around for a while and can't seem to find an
>> answer
>> > to this.  Is it possible to have the embedded jetty listen to H/2 as
>> well
>> > has HTTP/1.1, mainly i'd like to use this to access it on a private
>> subnet
>> > on AWS through HAProxy which is set up to prefer H/2.
>> >
>> > With base jetty its as simple as passing arguments to start.jar, but
>> can't
>> > find how to solve it with solr and the embedded jetty.
>> >
>> > Thanks,
>> > Jeff
>>
>
>


Re: configure jetty to use both http1.1 and H2

2018-02-23 Thread Jeff Dyke
Thanks for the tip Jason.  I didn't see the -j option there or here
https://lucene.apache.org/solr/guide/7_2/solr-control-script-reference.html

I'll keep this short, i tried to add it to the init.d script and then
interacting directly with the solr binary, but ultimately saw that
logs/solr-console-8983.log was updated with an exception saying dependency
not met.

It looks like only /opt/solr/server/lib/jetty-http-9.3.20.v20170531.jar is
bundled.

So i guess at this point i have one question that i don't think i'd go
through with b/c i'd like to keep the application install clean, But for
curiosity will solr pick up any jar in the lib directory and then i could
pass -j --module=http2, perhaps or define if where module=http is defined,
but that may just be passed at start i'd assume.

Otherwise i may just put nginx in front of the master admin, which has A
LOT of other security around it, b/c i am trying to access it from outside
my VPC.

Thanks again!
Jeff



On Fri, Feb 23, 2018 at 1:53 PM, Jason Gerlowski 
wrote:

> Hi Jeff,
>
> I haven't tested your exact use case regarding H/2, but the "bin/solr"
> startup script has a special "-j" options that can be used to pass
> arbitrary flags to the underlying Jetty server.  If you have options
> that work with vanilla Jetty, they _should_ work when passed through
> the "bin/solr" interface as well.  Check out "bin/solr start -help"
> for more info.
>
> If it doesn't work out, please let us know, and post the commands you
> tried, output, etc.
>
> Best,
>
> Jason
>
> On Fri, Feb 23, 2018 at 11:56 AM, Jeff Dyke  wrote:
> > Hi, I've been googling around for a while and can't seem to find an
> answer
> > to this.  Is it possible to have the embedded jetty listen to H/2 as well
> > has HTTP/1.1, mainly i'd like to use this to access it on a private
> subnet
> > on AWS through HAProxy which is set up to prefer H/2.
> >
> > With base jetty its as simple as passing arguments to start.jar, but
> can't
> > find how to solve it with solr and the embedded jetty.
> >
> > Thanks,
> > Jeff
>


Re: statistics in hitlist

2018-02-23 Thread John Smith
Hi Joel, thanks for the answer. I'm not really a stats guy, but the end
result of all this is supposed to be obtaining R^2. Is there no way of
obtaining this value, then (short of iterating over all the results in the
hitlist and calculating it myself)?

On Fri, Feb 23, 2018 at 12:26 PM, Joel Bernstein  wrote:

> Typically SSE is the sum of the squared errors of the prediction in a
> regression analysis. The stats component doesn't perform regression,
> although it might be a nice feature.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, Feb 23, 2018 at 12:17 PM, John Smith  wrote:
>
> > I'm using solr, and enabling stats as per this page:
> > https://lucene.apache.org/solr/guide/6_6/the-stats-component.html
> >
> > I want to get more stat values though. Specifically I'm looking for
> > r-squared (coefficient of determination). This value is not present in
> > solr, however some of the pieces used to calculate r^2 are in the stats
> > element, for example:
> >
> > 0.0
> > 10.0
> > 15
> > 17
> > 85.0
> > 603.0
> > 5.667
> > 2.943920288775949
> >
> >
> > So I have the sumOfSquares available (SST), and using this calculation, I
> > can get R^2:
> >
> > R^2 = 1 - SSE/SST
> >
> > All I need then is SSE. Is there anyway I can get SSE from those other
> > stats in solr?
> >
> > Thanks in advance!
> >
>


Re: Turn on/off query based on a url parameter

2018-02-23 Thread Roopa Rao
Thanks,
I got it working as below, features is true or false based on the efi
parameter is_var. (Field and value should be substituted with correct names)

{
"store": "featurestore",
"name": "isfeaturematch",
"class": "org.apache.solr.ltr.feature.SolrFeature",
"params": {
"q": "{!func}and(${is_var:false}, query({!v='field:value'}))"
}
}


On Thu, Feb 22, 2018 at 2:46 PM, Phil Scadden  wrote:

> I always filter solr request via a proxy (so solr itself is not exposed
> directly to the web). In that proxy, the query parameters can be broken
> down and filtered as desired (I examine authorities granted to a session to
> control even which indexes are being searched) before passing the modified
> url to solr. The coding of the proxy obviously depends on your application
> environment. We use java and Spring.
>
> -Original Message-
> From: Roopa Rao [mailto:roop...@gmail.com]
> Sent: Friday, 23 February 2018 8:04 a.m.
> To: solr-user@lucene.apache.org
> Subject: Turn on/off query based on a url parameter
>
> Hi,
>
> I want to enable or disable a SolrFeature in LTR based on efi parameter.
>
> In simple the query should be executed only if a parameter is true.
>
> Any examples or suggestion on how to accomplish this?
>
> Functions queries examples are are using fields to give a value to. In my
> case I want to execute the query only if a url parameter is true
>
> Thanks,
> Roopa
> Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of the
> Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> received in error please destroy and immediately notify GNS Science. Do not
> copy or disclose the contents.
>


Re: Atomic Updates : Performance Impact

2018-02-23 Thread Uday Jami
Thanks Erick for the useful information. Will keep the below points in mind
while designing my solution.

Thanks,
Uday

On Sat, Feb 24, 2018 at 12:47 AM, Erick Erickson 
wrote:

> bq: However if i dont have majority of other column data while doing update
> operations, is it better to go with atomic update?
>
> I don't understand what you're asking. To use Atomic Updates, _every_
> original field (i.e. any field that is _not_ the destination of a
> copyField directive) must be stored. That's just a basic requirement.
>
> bq: And also during the update process, if there is a simultaneous search
> request on the collection, will there be any lag in response?
>
> This is just like any other update, the changes will be visible after
> the next soft commit or hard-commmit-with-opensearcher-true.
>
> Best,
> Erick
>
> On Fri, Feb 23, 2018 at 9:39 AM, Uday Jami  wrote:
> > Hello Erick,
> >
> > Thanks for the explanation.
> > However if i dont have majority of other column data while doing update
> > operations, is it better to go with atomic update?
> >
> > And also during the update process, if there is a simultaneous search
> > request on the collection, will there be any lag in response?
> >
> >
> > Thanks,
> > Uday
> >
> > On Fri, Feb 23, 2018 at 10:47 PM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> The approximate amount of work will be very close to what it would
> >> take Solr to just index the documents from a client. Actually it puts
> >> a little _more_ of a load on Solr. In the case you do an Atomic
> >> Update, Solr has to
> >> 1> fetch all the stored fields from the index
> >> 2> construct a Solr document
> >> 3> change the values in the doc based on the atomic update
> >> 4> re-index the doc just as though it had received it from a client.
> >>
> >> Whereas if you just send the doc from an external client Solr has to
> >> 1> de-serialize the doc
> >> 2> index it (identical to step 4 above)
> >>
> >> The sweet spot for Atomic Updates is when you can't easily get the
> >> original document from the system-of-record.
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, Feb 23, 2018 at 9:02 AM, Uday Jami  wrote:
> >> > Can you please let me know what will be the performance impact of
> trying
> >> to
> >> > update 120Million records in a collection containing 1 billion
> records.
> >> > The collection contains around 30 columns and only one column out of
> it
> >> is
> >> > updated as part of atomic update.
> >> > Its not a batch update, the 120 Million updates will happen within 24
> >> hours.
> >> >
> >> > How is the search on the above collection going to get impacted during
> >> the
> >> > above update process.
> >> >
> >> > Thanks,
> >> > Uday
> >>
>


Re: Atomic Updates : Performance Impact

2018-02-23 Thread Erick Erickson
bq: However if i dont have majority of other column data while doing update
operations, is it better to go with atomic update?

I don't understand what you're asking. To use Atomic Updates, _every_
original field (i.e. any field that is _not_ the destination of a
copyField directive) must be stored. That's just a basic requirement.

bq: And also during the update process, if there is a simultaneous search
request on the collection, will there be any lag in response?

This is just like any other update, the changes will be visible after
the next soft commit or hard-commmit-with-opensearcher-true.

Best,
Erick

On Fri, Feb 23, 2018 at 9:39 AM, Uday Jami  wrote:
> Hello Erick,
>
> Thanks for the explanation.
> However if i dont have majority of other column data while doing update
> operations, is it better to go with atomic update?
>
> And also during the update process, if there is a simultaneous search
> request on the collection, will there be any lag in response?
>
>
> Thanks,
> Uday
>
> On Fri, Feb 23, 2018 at 10:47 PM, Erick Erickson 
> wrote:
>
>> The approximate amount of work will be very close to what it would
>> take Solr to just index the documents from a client. Actually it puts
>> a little _more_ of a load on Solr. In the case you do an Atomic
>> Update, Solr has to
>> 1> fetch all the stored fields from the index
>> 2> construct a Solr document
>> 3> change the values in the doc based on the atomic update
>> 4> re-index the doc just as though it had received it from a client.
>>
>> Whereas if you just send the doc from an external client Solr has to
>> 1> de-serialize the doc
>> 2> index it (identical to step 4 above)
>>
>> The sweet spot for Atomic Updates is when you can't easily get the
>> original document from the system-of-record.
>>
>> Best,
>> Erick
>>
>> On Fri, Feb 23, 2018 at 9:02 AM, Uday Jami  wrote:
>> > Can you please let me know what will be the performance impact of trying
>> to
>> > update 120Million records in a collection containing 1 billion records.
>> > The collection contains around 30 columns and only one column out of it
>> is
>> > updated as part of atomic update.
>> > Its not a batch update, the 120 Million updates will happen within 24
>> hours.
>> >
>> > How is the search on the above collection going to get impacted during
>> the
>> > above update process.
>> >
>> > Thanks,
>> > Uday
>>


Re: configure jetty to use both http1.1 and H2

2018-02-23 Thread Jason Gerlowski
Hi Jeff,

I haven't tested your exact use case regarding H/2, but the "bin/solr"
startup script has a special "-j" options that can be used to pass
arbitrary flags to the underlying Jetty server.  If you have options
that work with vanilla Jetty, they _should_ work when passed through
the "bin/solr" interface as well.  Check out "bin/solr start -help"
for more info.

If it doesn't work out, please let us know, and post the commands you
tried, output, etc.

Best,

Jason

On Fri, Feb 23, 2018 at 11:56 AM, Jeff Dyke  wrote:
> Hi, I've been googling around for a while and can't seem to find an answer
> to this.  Is it possible to have the embedded jetty listen to H/2 as well
> has HTTP/1.1, mainly i'd like to use this to access it on a private subnet
> on AWS through HAProxy which is set up to prefer H/2.
>
> With base jetty its as simple as passing arguments to start.jar, but can't
> find how to solve it with solr and the embedded jetty.
>
> Thanks,
> Jeff


Re: CustomScoreProvider Sucks, Need Help

2018-02-23 Thread Walter Underwood
By “6lac row”, do you mean you are fetching 600,000 results? That will be very, 
very slow.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 23, 2018, at 9:12 AM, ~$alpha`  wrote:
> 
> public class MatchingScoreProvider extends CustomScoreProvider {
> 
> }
> 
> Issues:
> 1. CustomScoreProvider works but is too slow.
> Even when I am writing return 1 on 1st line it's still taking  5 seconds for
> 6lac row.
> 
> 2. I added  @Slf4j but still not logger not working
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: StandardTokenizer and splitting on mixedcase strings

2018-02-23 Thread Rick Leir
Dan,
Lowercase filter before the tokenizer?
Cheers -- Rick

On February 23, 2018 6:08:27 AM EST, "Dan ."  wrote:
>Hi,
>
>The StandardTokenizerFactory splits strings like 'JavaScript' into
>'Java'
>and 'Script', but then searches with 'javascript' do not match the
>document.
>
>Is there a solr way to prevent StandardTokenizer from splitting
>mixedcase
>strings?
>
>Cheers,
>Dan

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Object not fetched because its identifier appears to be already in processing

2018-02-23 Thread Rick Leir
Ven,
Where do you see that message? 
Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: At which solr version was "Managed-schema" set as default?

2018-02-23 Thread BlackIce
Thank you,

at the current Nutch documentation for Nutch 14 which is build against Solr
6.6.x, it states to delete the "managed-schema.xml" and provide the Nutch
specific schema.xml and THEN create the core.

This works, it creates the core with all nutch specific fields, but solr
then automatically reverts to the managed-schema.. which also works.

This is all part of a quick-start tutorial...

My idea was, in order to avoid confusion, which has been arising when
people start to modify schema.xml,... is to set Solr back into
"schema-mode" by having the user issue a command like SED or AWK in order
to add the corresponding command line into solrconfig.xml before creating
the core along with a friendly reminder that this is just a "quick-start"
and to study the solr documentation for a more complex set-up.

Any thoughts and suggestions are greatly apreciated!

RRK

On Fri, Feb 23, 2018 at 5:51 PM, Shawn Heisey  wrote:

> On 2/23/2018 6:23 AM, BlackIce wrote:
>
>> I'm reworking some documentation for the Nutch project, and for the sake
>> of
>> correctness and completness could someone tell me at which version did
>> Solr
>> switch over to the "managed-Schema" by default?
>>
>
> It was version 5.5.
>
> In the versions before that, the primary example configset had been
> running managed-schema for a number of minor versions. It was explicitly
> stated in the configuration.
>
> In the latest version, all example configsets are managed-schema, but the
> techproducts example does NOT have the update processor that automatically
> adds unknown fields.
>
> Thanks,
> Shawn
>
>


Re: Atomic Updates : Performance Impact

2018-02-23 Thread Uday Jami
Hello Erick,

Thanks for the explanation.
However if i dont have majority of other column data while doing update
operations, is it better to go with atomic update?

And also during the update process, if there is a simultaneous search
request on the collection, will there be any lag in response?


Thanks,
Uday

On Fri, Feb 23, 2018 at 10:47 PM, Erick Erickson 
wrote:

> The approximate amount of work will be very close to what it would
> take Solr to just index the documents from a client. Actually it puts
> a little _more_ of a load on Solr. In the case you do an Atomic
> Update, Solr has to
> 1> fetch all the stored fields from the index
> 2> construct a Solr document
> 3> change the values in the doc based on the atomic update
> 4> re-index the doc just as though it had received it from a client.
>
> Whereas if you just send the doc from an external client Solr has to
> 1> de-serialize the doc
> 2> index it (identical to step 4 above)
>
> The sweet spot for Atomic Updates is when you can't easily get the
> original document from the system-of-record.
>
> Best,
> Erick
>
> On Fri, Feb 23, 2018 at 9:02 AM, Uday Jami  wrote:
> > Can you please let me know what will be the performance impact of trying
> to
> > update 120Million records in a collection containing 1 billion records.
> > The collection contains around 30 columns and only one column out of it
> is
> > updated as part of atomic update.
> > Its not a batch update, the 120 Million updates will happen within 24
> hours.
> >
> > How is the search on the above collection going to get impacted during
> the
> > above update process.
> >
> > Thanks,
> > Uday
>


Re: StandardTokenizer and splitting on mixedcase strings

2018-02-23 Thread Steve Rowe
Hi Dan,

StandardTokenizerFactory does not do this.

Maybe you have a filter in your analysis chain that does this?  E.g. 
WordDelimiterFilterFactory has this capability.

--
Steve
www.lucidworks.com

> On Feb 23, 2018, at 6:08 AM, Dan .  wrote:
> 
> Hi,
> 
> The StandardTokenizerFactory splits strings like 'JavaScript' into 'Java'
> and 'Script', but then searches with 'javascript' do not match the document.
> 
> Is there a solr way to prevent StandardTokenizer from splitting mixedcase
> strings?
> 
> Cheers,
> Dan



Re: statistics in hitlist

2018-02-23 Thread Joel Bernstein
Typically SSE is the sum of the squared errors of the prediction in a
regression analysis. The stats component doesn't perform regression,
although it might be a nice feature.



Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Feb 23, 2018 at 12:17 PM, John Smith  wrote:

> I'm using solr, and enabling stats as per this page:
> https://lucene.apache.org/solr/guide/6_6/the-stats-component.html
>
> I want to get more stat values though. Specifically I'm looking for
> r-squared (coefficient of determination). This value is not present in
> solr, however some of the pieces used to calculate r^2 are in the stats
> element, for example:
>
> 0.0
> 10.0
> 15
> 17
> 85.0
> 603.0
> 5.667
> 2.943920288775949
>
>
> So I have the sumOfSquares available (SST), and using this calculation, I
> can get R^2:
>
> R^2 = 1 - SSE/SST
>
> All I need then is SSE. Is there anyway I can get SSE from those other
> stats in solr?
>
> Thanks in advance!
>


Re: Indexing timeout issues with SolrCloud 7.1

2018-02-23 Thread Rick Leir
Tom
I think you are saying that all updates fail? Need to do a bit of 
troubleshooting. How about queries? What else is in the logs?
Cheers -- Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Atomic Updates : Performance Impact

2018-02-23 Thread Erick Erickson
The approximate amount of work will be very close to what it would
take Solr to just index the documents from a client. Actually it puts
a little _more_ of a load on Solr. In the case you do an Atomic
Update, Solr has to
1> fetch all the stored fields from the index
2> construct a Solr document
3> change the values in the doc based on the atomic update
4> re-index the doc just as though it had received it from a client.

Whereas if you just send the doc from an external client Solr has to
1> de-serialize the doc
2> index it (identical to step 4 above)

The sweet spot for Atomic Updates is when you can't easily get the
original document from the system-of-record.

Best,
Erick

On Fri, Feb 23, 2018 at 9:02 AM, Uday Jami  wrote:
> Can you please let me know what will be the performance impact of trying to
> update 120Million records in a collection containing 1 billion records.
> The collection contains around 30 columns and only one column out of it is
> updated as part of atomic update.
> Its not a batch update, the 120 Million updates will happen within 24 hours.
>
> How is the search on the above collection going to get impacted during the
> above update process.
>
> Thanks,
> Uday


statistics in hitlist

2018-02-23 Thread John Smith
I'm using solr, and enabling stats as per this page:
https://lucene.apache.org/solr/guide/6_6/the-stats-component.html

I want to get more stat values though. Specifically I'm looking for
r-squared (coefficient of determination). This value is not present in
solr, however some of the pieces used to calculate r^2 are in the stats
element, for example:

0.0
10.0
15
17
85.0
603.0
5.667
2.943920288775949


So I have the sumOfSquares available (SST), and using this calculation, I
can get R^2:

R^2 = 1 - SSE/SST

All I need then is SSE. Is there anyway I can get SSE from those other
stats in solr?

Thanks in advance!


Re: SolrException: Error Instantiating queryParser, com.site.s.CustomQParserPlugin failed to instantiate org.apache.solr.search.QParserPlugin

2018-02-23 Thread ~$alpha`
Resolved by inclding solr on depency using maven



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


CustomScoreProvider Sucks, Need Help

2018-02-23 Thread ~$alpha`
public class MatchingScoreProvider extends CustomScoreProvider {

}

Issues:
1. CustomScoreProvider works but is too slow.
Even when I am writing return 1 on 1st line it's still taking  5 seconds for
6lac row.

2. I added  @Slf4j but still not logger not working



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Atomic Updates : Performance Impact

2018-02-23 Thread Uday Jami
Can you please let me know what will be the performance impact of trying to
update 120Million records in a collection containing 1 billion records.
The collection contains around 30 columns and only one column out of it is
updated as part of atomic update.
Its not a batch update, the 120 Million updates will happen within 24 hours.

How is the search on the above collection going to get impacted during the
above update process.

Thanks,
Uday


configure jetty to use both http1.1 and H2

2018-02-23 Thread Jeff Dyke
Hi, I've been googling around for a while and can't seem to find an answer
to this.  Is it possible to have the embedded jetty listen to H/2 as well
has HTTP/1.1, mainly i'd like to use this to access it on a private subnet
on AWS through HAProxy which is set up to prefer H/2.

With base jetty its as simple as passing arguments to start.jar, but can't
find how to solve it with solr and the embedded jetty.

Thanks,
Jeff


Re: At which solr version was "Managed-schema" set as default?

2018-02-23 Thread Shawn Heisey

On 2/23/2018 6:23 AM, BlackIce wrote:

I'm reworking some documentation for the Nutch project, and for the sake of
correctness and completness could someone tell me at which version did Solr
switch over to the "managed-Schema" by default?


It was version 5.5.

In the versions before that, the primary example configset had been 
running managed-schema for a number of minor versions. It was explicitly 
stated in the configuration.


In the latest version, all example configsets are managed-schema, but 
the techproducts example does NOT have the update processor that 
automatically adds unknown fields.


Thanks,
Shawn



Object not fetched because its identifier appears to be already in processing

2018-02-23 Thread YELESWARAPU, VENKATA BHAN
Information Classification: ** Limited Access

Dear Users,


While indexing job is running we are seeing the below message for all the 
objects.

Object not fetched because its identifier appears to be already in processing



What is the issue and how to resolve this so that indexing can work. Could you 
please guide.



Thank you,

Dutt



Re: LTR and 'searching' a streaming expression result

2018-02-23 Thread Joel Bernstein
In the scenario you describe above the answer is no. That's because the
joins rely on the sort order of the result set and require exporting of the
entire result set. Both those requirements will not work with ltr.

The search expression though could be used with ltr and the fetch
expression, which doesn't require full export or a specific sort order.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Feb 23, 2018 at 5:54 AM, Gintautas Sulskus <
gintautas.suls...@gmail.com> wrote:

> Hi,
>
> Is it possible to apply another search to a streaming expression result?
> E.g. to use leftOuterJoin as a source for search:
>
> search(
> leftOuterJoin(
> leftOuterJoin(search(), search())
> leftOuterJoin(..)
> ),
> q=... )
>
> Is it possible to apply LTR to the streaming expression result?
>
>
> Thanks,
> Gintas
>


Re: Spark-Solr -- unresolved dependencies

2018-02-23 Thread Shawn Heisey

On 2/23/2018 4:50 AM, Selvam Raman wrote:

spark version - EMR 2.0.0

spark-shell --packages com.lucidworks.spark:spark-solr:3.0.1

when i tired about command, am getting below error


::

::  UNRESOLVED DEPENDENCIES ::

::

:: org.restlet.jee#org.restlet;2.3.0: not found

:: org.restlet.jee#org.restlet.ext.servlet;2.3.0: not found

::


This does not look like anything included with Solr.  The error looks 
entirely like third-party software.  You're going to need to talk to 
whoever made that software for help.


Thanks,
Shawn



Re: Solr not accessible - javax.net.ssl.SSLException

2018-02-23 Thread Shawn Heisey

On 2/22/2018 11:16 PM, protonmail4us wrote:
ERROR: Failed to get system information from 
https://localhost:8282/solr due to: javax.net.ssl.SSLException: 
Certificate for doesn't match any of the subject alternative names: 
[*.ishippo.com, ishippo.com]


It says the certificate is valid for *.ishippo.com and ishippo.com ... 
but the message says that the hostname in the URL being used is 
localhost.  That's not going to work unless you disable certificate 
validation.


If this message is in the server log and you're running SolrCloud, then 
you're going to have to change the "host" parameter that Solr uses to 
register itself with the cloud information in zookeeper, so that it is 
using a hostname that the certificate covers.


If the message is coming from client code, you're going to need to 
change the URL in the client code.


Thanks,
Shawn



Re: SOLR Score Range Changed

2018-02-23 Thread Joël Trigalo
The difference seems due to the fact that default similarity in solr 7 is
BM25 while it used to be TF-IDF in solr 4. As you realised, BM25 function
is smoother.
You can configure schema.xml to use ClassicSimilarity, for instance
https://lucene.apache.org/solr/guide/6_6/major-changes-from-solr-5-to-solr-6.html#default-similarity-changes
https://lucene.apache.org/solr/guide/6_6/field-type-definitions-and-properties.html#FieldTypeDefinitionsandProperties-FieldTypeSimilarity

But as said before, maybe you are using properties that are not guaranteed
so it would be better to change score function or sorting (rather than
coming back to ClassicSimilarity)

2018-02-22 18:39 GMT+01:00 Shawn Heisey :

> On 2/22/2018 9:50 AM, Hodder, Rick wrote:
>
>> I am migrating from SOLR 4.10.2 to SOLR 7.1.
>>
>> All seems to be going well, except for one thing: the score that is
>> coming back for the resulting documents is giving different scores.
>>
>
> The absolute score has no meaning when you change something -- the index,
> the query, the software version, etc.  You can't compare absolute scores.
>
> What matters is the relative score of one document to another *in the same
> query*.  The amount of difference is almost irrelevant -- the goal of
> Lucene's score calculation gymnastics is to have one document score higher
> than another, so the *order* is reasonably correct.
>
> Assuming you're using the default relevancy sort, does the order of your
> search results change dramatically from one version to the other?  If it
> does, is the order generally better from a relevance standpoint, or
> generally worse?  If you are specifying an explicit sort, then the scores
> will likely be ignored.
>
> What I am describing is also why it's strongly recommended that you never
> try to convert scores to percentages:
>
> https://wiki.apache.org/lucene-java/ScoresAsPercentages
>
> Thanks,
> Shawn
>
>


Re: Download solr data(only one field) to csv

2018-02-23 Thread Emir Arnautović
Hi Selvam,
Using start/rows to download 10M docs is what is called deep paging. You need 
to either use cursors 
(https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html 
) or 
export handler 
(https://lucene.apache.org/solr/guide/6_6/exporting-result-sets.html 
).

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 23 Feb 2018, at 13:38, Selvam Raman  wrote:
> 
> Hi,
> 
> I have 10 million of record in solr index. I want to download whole record
> in csv format with one field.
> 
> I have 20+ fields, but i want to download data with (fl=title) only title
> field.
> 
> http://localhost:8983/solr/containerMetadata/select?q=*=external_id_s,container_title_en=csv=true=100
> 
> the above command not seems to effective to download 10 million record.
> Could you please suggest an idea?
> 
> -- 
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"



Re: Issue Using JSON Facet API Buckets in Solr 6.6

2018-02-23 Thread Antelmo Aguilar
Hi Yonik,

Good to hear you were able to reproduce it.  Looking forward for the fix.
Will use the version of Solr that works in the meantime.

-Antelmo

On Thu, Feb 22, 2018 at 5:10 PM, Yonik Seeley  wrote:

> I've reproduced the issue and opened
> https://issues.apache.org/jira/browse/SOLR-12020
>
> -Yonik
>
>
>
> On Thu, Feb 22, 2018 at 11:03 AM, Yonik Seeley  wrote:
> > Thanks Antelmo, I'm trying to reproduce this now.
> > -Yonik
> >
> >
> > On Mon, Feb 19, 2018 at 10:13 AM, Antelmo Aguilar 
> wrote:
> >> Hi all,
> >>
> >> I was wondering if the information I sent is sufficient to look into the
> >> issue.  Let me know if you need anything else from me please.
> >>
> >> Thanks,
> >> Antelmo
> >>
> >> On Thu, Feb 15, 2018 at 1:56 PM, Antelmo Aguilar 
> wrote:
> >>
> >>> Hi,
> >>>
> >>> Here are two pastebins.  The first is the full complete response with
> the
> >>> search parameters used.  The second is the stack trace from the logs:
> >>>
> >>> https://pastebin.com/rsHvKK63
> >>>
> >>> https://pastebin.com/8amxacAj
> >>>
> >>> I am not using any custom code or plugins with the Solr instance.
> >>>
> >>> Please let me know if you need anything else and thanks for looking
> into
> >>> this.
> >>>
> >>> -Antelmo
> >>>
> >>> On Wed, Feb 14, 2018 at 12:56 PM, Yonik Seeley 
> wrote:
> >>>
>  Could you provide the full stack trace containing "Invalid Date
>  String"  and the full request that causes it?
>  Are you using any custom code/plugins in Solr?
>  -Yonik
> 
> 
>  On Mon, Feb 12, 2018 at 4:55 PM, Antelmo Aguilar 
> wrote:
>  > Hi,
>  >
>  > I was using the following part of a query to get facet buckets so
> that I
>  > can use the information in the buckets for some post-processing:
>  >
>  > "json":
>  > "{\"filter\":[\"bundle:pop_sample\",\"has_abundance_data_b:
>  true\",\"has_geodata:true\",\"${project}\"],\"facet\":{\"ter
>  m\":{\"type\":\"terms\",\"limit\":-1,\"field\":\"${term:spec
>  ies_category}\",\"facet\":{\"collection_dates\":{\"type\":\
>  "terms\",\"limit\":-1,\"field\":\"collection_date\",\"facet\
>  ":{\"collection\":
>  > {\"type\":\"terms\",\"field\":\"collection_assay_id_s\",\"fa
>  cet\":{\"abnd\":\"sum(div(sample_size_i,
>  > collection_duration_days_i))\""
>  >
>  > Sorry if it is hard to read.  Basically what is was doing was
> getting
>  the
>  > following buckets:
>  >
>  > First bucket will be categorized by "Species category" by default
>  unless we
>  > pass in the request the "term" parameter which we will categories
> the
>  first
>  > bucket by whatever "term" is set to.  Then inside this first
> bucket, we
>  > create another buckets of the "Collection date" category.  Then
> inside
>  the
>  > "Collection date" category buckets, we would use some functions to
> do
>  some
>  > calculations and return those calculations inside the "Collection
> date"
>  > category buckets.
>  >
>  > This query is working fine in Solr 6.2, but I upgraded our instance
> of
>  Solr
>  > 6.2 to the latest 6.6 version.  However it seems that upgrading to
> Solr
>  6.6
>  > broke the above query.  Now it complains when trying to create the
>  buckets
>  > of the "Collection date" category.  I get the following error:
>  >
>  > Invalid Date String:'Fri Aug 01 00:00:00 UTC 2014'
>  >
>  > It seems that when creating the buckets of a date field, it does
> some
>  > conversion of the way the date is stored and causes the error to
> appear.
>  > Does anyone have an idea as to why this error is happening?  I would
>  really
>  > appreciate any help.  Hopefully I was able to explain my issue well.
>  >
>  > Thanks,
>  > Antelmo
> 
> >>>
> >>>
>


At which solr version was "Managed-schema" set as default?

2018-02-23 Thread BlackIce
hi,

I'm reworking some documentation for the Nutch project, and for the sake of
correctness and completness could someone tell me at which version did Solr
switch over to the "managed-Schema" by default?

Thank you very much!

RRK


Re: Solrj : ConcurrentUpdateSolrClient based on QueueSize and Time

2018-02-23 Thread Santosh Narayan
Thanks Jason. Hope this can be fixed in the next update of SolrJ.



On Thu, Feb 22, 2018 at 10:49 AM, Jason Gerlowski 
wrote:

> My apologies Santosh.  I added that comment a few releases back based
> on a misunderstanding I've only recently been disabused of.  I will
> correct it.
>
> Anyway, Shawn's explanation above is correct.  The queueSize parameter
> doesn't control batching, as he clarified.  Sorry for the trouble.
>
> Best,
>
> Jason
>
> On Wed, Feb 21, 2018 at 8:50 PM, Santosh Narayan
>  wrote:
> > Thanks for the explanation Shawn. Very helpful. I think I got misled by
> the
> > JavaDoc text for
> > *ConcurrentUpdateSolrClient.Builder.withQueueSize*
> > /**
> >  * The number of documents to batch together before sending to Solr.
> If
> > not set, this defaults to 10.
> >  */
> > public Builder withQueueSize(int queueSize) {
> >   if (queueSize <= 0) {
> > throw new IllegalArgumentException("queueSize must be a positive
> > integer.");
> >   }
> >   this.queueSize = queueSize;
> >   return this;
> > }
> >
> >
> >
> > On Thu, Feb 22, 2018 at 9:41 AM, Shawn Heisey 
> wrote:
> >
> >> On 2/21/2018 7:41 AM, Santosh Narayan wrote:
> >> > May be it is my understanding of the documentation. As per the
> >> > JavaDoc, ConcurrentUpdateSolrClient
> >> > buffers all added documents and writes them into open HTTP
> connections.
> >> >
> >> > So I thought that this class would buffer documents in the client side
> >> > itself till the QueueSize is reached and then send all the cached
> >> documents
> >> > together in one HTTP request. Is this not the case?
> >>
> >> That's not how it's designed.
> >>
> >> What ConcurrentUpdateSolrClient does differently than HttpSolrClient or
> >> CloudSolrClient is return control immediately to your program when you
> >> send an update, and begin processing that update in the background.  If
> >> you send a LOT of updates very quickly, then the queue will get larger,
> >> and will typically be processed in parallel by multiple threads.  The
> >> client won't wait for the queue to fill.  Processing of the first update
> >> you send should begin right after you add it.
> >>
> >> Something to consider:  Because control is returned to your program
> >> immediately, and the response is always a success, your program will
> >> never be informed about any problems with your adds when you use the
> >> concurrent client.  The concurrent client is a great choice for initial
> >> bulk indexing, because it offers multi-threaded indexing without any
> >> need to handle the threads yourself.  But you don't get any kind of
> >> error handling.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>


Download solr data(only one field) to csv

2018-02-23 Thread Selvam Raman
Hi,

I have 10 million of record in solr index. I want to download whole record
in csv format with one field.

I have 20+ fields, but i want to download data with (fl=title) only title
field.

http://localhost:8983/solr/containerMetadata/select?q=*=external_id_s,container_title_en=csv=true=100

the above command not seems to effective to download 10 million record.
Could you please suggest an idea?

-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"


Spark-Solr -- unresolved dependencies

2018-02-23 Thread Selvam Raman
Hi,

spark version - EMR 2.0.0

spark-shell --packages com.lucidworks.spark:spark-solr:3.0.1

when i tired about command, am getting below error


::

::  UNRESOLVED DEPENDENCIES ::

::

:: org.restlet.jee#org.restlet;2.3.0: not found

:: org.restlet.jee#org.restlet.ext.servlet;2.3.0: not found

::



:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved
dependency: org.restlet.jee#org.restlet;2.3.0: not found, unresolved
dependency: org.restlet.jee#org.restlet.ext.servlet;2.3.0: not found]
at
org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1066)
at
org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:294)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:158)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"


StandardTokenizer and splitting on mixedcase strings

2018-02-23 Thread Dan .
Hi,

The StandardTokenizerFactory splits strings like 'JavaScript' into 'Java'
and 'Script', but then searches with 'javascript' do not match the document.

Is there a solr way to prevent StandardTokenizer from splitting mixedcase
strings?

Cheers,
Dan


LTR and 'searching' a streaming expression result

2018-02-23 Thread Gintautas Sulskus
Hi,

Is it possible to apply another search to a streaming expression result?
E.g. to use leftOuterJoin as a source for search:

search(
leftOuterJoin(
leftOuterJoin(search(), search())
leftOuterJoin(..)
),
q=... )

Is it possible to apply LTR to the streaming expression result?


Thanks,
Gintas


Re: Solr Basic Authentication setup issue (password SolrRocks not accepted) on Solr6.1.0/Zkp3.4.6

2018-02-23 Thread Atita Arora
Hi,

I tried the same on version 7.0.1 and it works with the same json.
However , I remember setting this up for another client who used the same
version and they reported similar issues.
They later planned an upgrade to resolve this.

I would also advice you to look into SOLR-9188
 &  SOLR-9640
.
The internode communication is a buggy feature as far as I believe in
BasicAuth Solr V6.1 which eventually got fixed in later versions.

Thanks,
Atita


On Fri, Feb 23, 2018 at 1:25 PM, Tarjono, C. A. 
wrote:

> Dear All,
>
>
>
> We are trying to implement basic authentication in our solrcloud
> implementation. We followed the PDF (for version 6.1.0) as below:
>
>1. Start Solr
>2. Created security.json
>
> {
>
> "authentication":{
>
> "blockUnknown": true,
>
> "class":"solr.BasicAuthPlugin",
>
> "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+
> z1oBbnQdiVC3otuq0=Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
>
> },
>
> "authorization":{
>
> "class":"solr.RuleBasedAuthorizationPlugin",
>
> "permissions":[{"name":"security-edit",
> "role":"admin"}],
>
> "user-role":{"solr":"admin"}
>
> }
>
> }
>
>1. Uploaded the new security.json with below command
>
> # ./zkcli.sh -zkhost localhost:2181 -cmd putfile /security.json
> /u02/solr/setup/security.json
>
>1. Open up the solr admin page and prompted with authentication
>2. We try inputting username “solr” and password “SolrRocks” but it
>will not authenticate.
>
>
>
>
>
> From what I understand, that username/password combination is the default
> that will have to be changed later. Any ideas why it is not working?
>
> We tried to check for special characters in the encrypted password, there
> was none. For now we are removing the flag “blockUnknown” as a workaround.
>
>
>
> We are using SolrCloud 6.1.0 and Zookeeper 3.4.6 (ensamble) in our setup.
> Appreciate the input.
>
>
>
>
>
> Best Regards,
>
>
>
> Christopher Tarjono
>
> *Accenture Pte Ltd*
>
>
>
> +65 9347 2484
>
> c.a.tarj...@accenture.com
>
>
>
> --
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
> 
> __
>
> www.accenture.com
>


Solr Basic Authentication setup issue (password SolrRocks not accepted) on Solr6.1.0/Zkp3.4.6

2018-02-23 Thread Tarjono, C. A.
Dear All,

We are trying to implement basic authentication in our solrcloud 
implementation. We followed the PDF (for version 6.1.0) as below:

  1.  Start Solr
  2.  Created security.json
{
"authentication":{
"blockUnknown": true,
"class":"solr.BasicAuthPlugin",

"credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0=Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
},
"authorization":{
"class":"solr.RuleBasedAuthorizationPlugin",
"permissions":[{"name":"security-edit", 
"role":"admin"}],
"user-role":{"solr":"admin"}
}
}


  1.  Uploaded the new security.json with below command

# ./zkcli.sh -zkhost localhost:2181 -cmd putfile /security.json 
/u02/solr/setup/security.json

  1.  Open up the solr admin page and prompted with authentication
  2.  We try inputting username "solr" and password "SolrRocks" but it will not 
authenticate.




>From what I understand, that username/password combination is the default that 
>will have to be changed later. Any ideas why it is not working?
We tried to check for special characters in the encrypted password, there was 
none. For now we are removing the flag "blockUnknown" as a workaround.

We are using SolrCloud 6.1.0 and Zookeeper 3.4.6 (ensamble) in our setup. 
Appreciate the input.


Best Regards,

Christopher Tarjono
Accenture Pte Ltd

+65 9347 2484
c.a.tarj...@accenture.com
[cid:image003.jpg@01D3ACBE.B7B3BD50]




This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com