Re: Arabic words search in solr

2017-03-08 Thread mohanmca01
Hi Stave,

Thanks for the support, I tried below cases but still i'm not able to get
the expected results.

Case 1 :

Input :  bizNameAr:شرطة + ازكي

Output : {

  "responseHeader": {
"status": 0,
"QTime": 1,
"params": {
  "indent": "true",
  "q": " bizNameAr:شرطة + ازكي",
  "_": "1489041466096",
  "wt": "json"
}
  },
  "response": {
"numFound": 4,
"start": 0,
"docs": [
  {
"id": "82",
"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية
- - مركز شرطة إزكي",
"_version_": 1560298301338681300
  },
  {
"id": "63",
"bizNameAr": "شركة ظفار للتأمين ش.م.ع.ع - فرع ازكي",
"_version_": 1560298301325049900
  },
  {
"id": "56",
"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة شمال
الشرقية  -  - مركز شرطة إبراء",
"_version_": 1560298301319807000
  },
  {
"id": "79",
"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة شمال
الشرقية - - مركز شرطة إبراء",
"_version_": 1560298301335535600
  }
]
  }
}


In this case document id : 63,56,79 are not matching with the input,
where id 82 is the only correct in these results.



Case 2:


{
  "responseHeader": {
"status": 0,
"QTime": 3,
"params": {
  "indent": "true",
  "q": " bizNameAr:شرطة AND ازكي",
  "_": "1489043935549",
  "wt": "json"
}
  },
  "response": {
"numFound": 0,
"start": 0,
"docs": []
  }
}


if AND is given in between of the terms then no results are shown.

I saw your products in lucidworks website. Do you have any solr arabic
support customized product?

Thanks,



On Thu, Mar 2, 2017 at 7:01 PM, sarowe [via Lucene] <
ml-node+s472066n4323036...@n3.nabble.com> wrote:

> Hi Mohan,
>
> > On Feb 26, 2017, at 1:37 AM, mohanmca01 <[hidden email]
> > wrote:
> >
> > i searched with (bizNameAr: شرطة ازكي), and am getting:
> > […]
> >
> > the expected result is:   "id": "82",
> >  "bizNameAr": "شرطة عمان السلطانية -
> قيادة
> > شرطة محافظة الداخلية - - مركز *شرطة إزكي*",
> >
> > as the above has both the words mentioned in the query (marked as Bold),
> > where the rest have the following:
> >
> >"id": "63",
> >"bizNameAr": "شركة ظفار للتأمين ش.م.ع.ع - فرع ازكي"
> >
> > it has only one word of the query (ازكي)
> >
> >"id": "56",
> >"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة شمال
> الشرقية
> > -  - مركز شرطة إبراء"
> >
> > it has only one word of the query (شرطة)
> >
> > "id": "79",
> > "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة شمال الشرقية - -
> مركز
> > شرطة إبراء"
> >
> > It has only one word of the query (شرطة)
> >
> > where the above 3 records should not come in the result since already 2
> > words mentioned in the query, and only one record has these two words.
>
> Solr's standard query language includes two mechanisms for requiring
> terms: ‘+’ before a required term, and ‘AND’ between two required terms.
>  ‘+’ is better - see  12/28/why-not-and-or-and-not/> for more information.
>
> You can also set the default operator to ‘AND’, e.g. via request parameter
> “=AND” (if this is always what you want, you can include this in the
> /select request handler’s definition in solrconfig.xml).  See <
> https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser>
> for more information.
>
> > I would really suggest if we can give you a real-time demo on our system
> > with my Arab colleague so it can be more clear for you. let us know if
> we
> > can do that.
>
> I prefer to keep discussion on this public mailing list so that others can
> benefit.  If you find that you need faster or more interactive help, you
> can check out the list of people who have indicated that they provide Solr
> support: .
>
> --
> Steve
> www.lucidworks.com
>
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-
> tp4317733p4323036.html
> To unsubscribe from Arabic words search in solr, click here
> 
> .
> NAML
> 
>



-- 
Regards,
Mohan.N
9865998919




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4324142.html
Sent from the Solr - User mailing list archive at 

Re: [ANNOUNCE] Apache Solr 6.4.2 released

2017-03-08 Thread Ishan Chattopadhyaya
Hi Bernd,
Can you please double check?

I downloaded the 6.4.2 tarball and see that they have 6.4.2:

[ishan@ishanvps solr-6.4.2]$ grep -rn "luceneMatchVersion" *|grep
solrconfig.xml
CHANGES.txt:1474:   or
your luceneMatchVersion in the solrconfig.xml is less than 6.0
docs/changes/Changes.html:1694:schemaFactory
class="ClassicIndexSchemaFactory"/ or your luceneMatchVersion in the
solrconfig.xml is less than 6.0
example/files/conf/solrconfig.xml:38:
6.4.2
example/example-DIH/solr/tika/conf/solrconfig.xml:38:
6.4.2
example/example-DIH/solr/rss/conf/solrconfig.xml:38:
6.4.2
example/example-DIH/solr/mail/conf/solrconfig.xml:38:
6.4.2
example/example-DIH/solr/db/conf/solrconfig.xml:38:
6.4.2
example/example-DIH/solr/solr/conf/solrconfig.xml:38:
6.4.2
server/solr/configsets/basic_configs/conf/solrconfig.xml:38:
6.4.2
server/solr/configsets/sample_techproducts_configs/conf/solrconfig.xml:38:
6.4.2
server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml:38:
6.4.2


Maybe you downloaded the 6.4.1 version by mistake?
Thanks,
Ishan


On Thu, Mar 9, 2017 at 10:19 AM, Shawn Heisey  wrote:

> On 3/8/2017 2:36 AM, Bernd Fehling wrote:
> > Shouldn't in server/solr/configsets/.../solrconfig.xml
> > 6.4.1
> > really read
> > 6.4.2
> >
> > May be something for package builder for future releases?
>
> That does look like it got overlooked, and is generally something that
> SHOULD be changed with each new version, but in this case, changing
> between those two version numbers will have zero effect.  It is against
> project policy to make significant changes in a bugfix release (where
> third version number changes).
>
> Any change that's significant enough to be controlled by a
> luceneMatchVersion check would only be allowed a minor or major release.
>
> Thanks,
> Shawn
>
>


Re: [ANNOUNCE] Apache Solr 6.4.2 released

2017-03-08 Thread Zheng Lin Edwin Yeo
Hi,

Just to check, are the index that was indexed in Solr 6.4.1 affected by the
bug? Do we have to re-index those records when we move to Solr 6.4.2?

Regards,
Edwin


On 9 March 2017 at 12:49, Shawn Heisey  wrote:

> On 3/8/2017 2:36 AM, Bernd Fehling wrote:
> > Shouldn't in server/solr/configsets/.../solrconfig.xml
> > 6.4.1
> > really read
> > 6.4.2
> >
> > May be something for package builder for future releases?
>
> That does look like it got overlooked, and is generally something that
> SHOULD be changed with each new version, but in this case, changing
> between those two version numbers will have zero effect.  It is against
> project policy to make significant changes in a bugfix release (where
> third version number changes).
>
> Any change that's significant enough to be controlled by a
> luceneMatchVersion check would only be allowed a minor or major release.
>
> Thanks,
> Shawn
>
>


Re: [ANNOUNCE] Apache Solr 6.4.2 released

2017-03-08 Thread Shawn Heisey
On 3/8/2017 2:36 AM, Bernd Fehling wrote:
> Shouldn't in server/solr/configsets/.../solrconfig.xml
> 6.4.1
> really read
> 6.4.2
>
> May be something for package builder for future releases?

That does look like it got overlooked, and is generally something that
SHOULD be changed with each new version, but in this case, changing
between those two version numbers will have zero effect.  It is against
project policy to make significant changes in a bugfix release (where
third version number changes).

Any change that's significant enough to be controlled by a
luceneMatchVersion check would only be allowed a minor or major release.

Thanks,
Shawn



Re: Problems executing boolean queries involving NOT clauses

2017-03-08 Thread Sundeep T
I am just trying to clarify whether there is a bug here in solr. It seems
that when solr tranlsates sql into the underlying solr query, it puts
parantheses around "NOT" clause expressions. But that does not seem to be
working correctly and is not returning expected results. If parantheses
around the "NOT" clause, are removed, then correct results are returned

On Wed, Mar 8, 2017 at 7:39 PM, Erick Erickson 
wrote:

> What _exactly_ are you testing? It's unclear whether you're asking
> about general Lucene/Solr syntax or some of the recent streaming SQL
> work.
>
> On Wed, Mar 8, 2017 at 7:34 PM, Sundeep T  wrote:
> > Hi,
> >
> > I am using solr 6.3 version.
> >
> > We are seeing issues involving NOT clauses when they are paired in
> boolean expressions. The issues specifically occur when the “NOT” clause is
> surrounded by paratheses.
> >
> > For example, the following solr query does not return any results -
> >
> > (timestamp:[* TO "2017-08-17T07:12:55.807Z"]) AND (-text:"Daemon”)
> >
> > But if I remove the parantheses around the “NOT” clause for text param
> it returns expected results. Like, the below query works as expected -
> >
> > (timestamp:[* TO "2017-08-17T07:12:55.807Z"]) AND -text:”Daemon”
> >
> > This problem seems to happen only for boolean expression queries. If i
> give a singular query like below involving NOT with parantheses, it still
> works  -
> > (-text:"Daemon”)
> >
> > I see that the parantheses around the expression is added in SQLVisitor
> class in these lines. I tried removing the parantheses for NOT case and the
> code works.
> >
> > case NOT_EQUAL:
> > buf.append('-').append(field).append(":").append(value);
> > return null;
> >
> > Any ideas what’s going on here and why parantheses are causing an issue?
> >
> > Thanks
> > Sundeep
> >
> >
>


Re: Problems executing boolean queries involving NOT clauses

2017-03-08 Thread Alexandre Rafalovitch
>From the first class, it seems similar to
https://wiki.apache.org/solr/NegativeQueryProblems

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 8 March 2017 at 22:34, Sundeep T  wrote:
> Hi,
>
> I am using solr 6.3 version.
>
> We are seeing issues involving NOT clauses when they are paired in boolean 
> expressions. The issues specifically occur when the “NOT” clause is 
> surrounded by paratheses.
>
> For example, the following solr query does not return any results -
>
> (timestamp:[* TO "2017-08-17T07:12:55.807Z"]) AND (-text:"Daemon”)
>
> But if I remove the parantheses around the “NOT” clause for text param it 
> returns expected results. Like, the below query works as expected -
>
> (timestamp:[* TO "2017-08-17T07:12:55.807Z"]) AND -text:”Daemon”
>
> This problem seems to happen only for boolean expression queries. If i give a 
> singular query like below involving NOT with parantheses, it still works  -
> (-text:"Daemon”)
>
> I see that the parantheses around the expression is added in SQLVisitor class 
> in these lines. I tried removing the parantheses for NOT case and the code 
> works.
>
> case NOT_EQUAL:
> buf.append('-').append(field).append(":").append(value);
> return null;
>
> Any ideas what’s going on here and why parantheses are causing an issue?
>
> Thanks
> Sundeep
>
>


Re: Problems executing boolean queries involving NOT clauses

2017-03-08 Thread Erick Erickson
What _exactly_ are you testing? It's unclear whether you're asking
about general Lucene/Solr syntax or some of the recent streaming SQL
work.

On Wed, Mar 8, 2017 at 7:34 PM, Sundeep T  wrote:
> Hi,
>
> I am using solr 6.3 version.
>
> We are seeing issues involving NOT clauses when they are paired in boolean 
> expressions. The issues specifically occur when the “NOT” clause is 
> surrounded by paratheses.
>
> For example, the following solr query does not return any results -
>
> (timestamp:[* TO "2017-08-17T07:12:55.807Z"]) AND (-text:"Daemon”)
>
> But if I remove the parantheses around the “NOT” clause for text param it 
> returns expected results. Like, the below query works as expected -
>
> (timestamp:[* TO "2017-08-17T07:12:55.807Z"]) AND -text:”Daemon”
>
> This problem seems to happen only for boolean expression queries. If i give a 
> singular query like below involving NOT with parantheses, it still works  -
> (-text:"Daemon”)
>
> I see that the parantheses around the expression is added in SQLVisitor class 
> in these lines. I tried removing the parantheses for NOT case and the code 
> works.
>
> case NOT_EQUAL:
> buf.append('-').append(field).append(":").append(value);
> return null;
>
> Any ideas what’s going on here and why parantheses are causing an issue?
>
> Thanks
> Sundeep
>
>


Problems executing boolean queries involving NOT clauses

2017-03-08 Thread Sundeep T
Hi,

I am using solr 6.3 version.

We are seeing issues involving NOT clauses when they are paired in boolean 
expressions. The issues specifically occur when the “NOT” clause is surrounded 
by paratheses.

For example, the following solr query does not return any results -

(timestamp:[* TO "2017-08-17T07:12:55.807Z"]) AND (-text:"Daemon”)

But if I remove the parantheses around the “NOT” clause for text param it 
returns expected results. Like, the below query works as expected -

(timestamp:[* TO "2017-08-17T07:12:55.807Z"]) AND -text:”Daemon”

This problem seems to happen only for boolean expression queries. If i give a 
singular query like below involving NOT with parantheses, it still works  -
(-text:"Daemon”)

I see that the parantheses around the expression is added in SQLVisitor class 
in these lines. I tried removing the parantheses for NOT case and the code 
works.

case NOT_EQUAL:
buf.append('-').append(field).append(":").append(value);
return null;

Any ideas what’s going on here and why parantheses are causing an issue?

Thanks
Sundeep




Re: https

2017-03-08 Thread Rick Leir
Hi pub
You need to google CORS cross origin resource sharing. But no need to worry 
about CORS if all of the JavaScript for the Solr site is on the Solr server.

But as others have said, it is best to have some PHP or Python UI in front of 
Solr.
Cheers
Rick

On March 8, 2017 2:11:36 PM EST, pubdiverses  wrote:
>Hello,
>
>I give you some more explanation.
>
>I have a site https://site.com under Apache.
>On the same physical server, i've installed solr.
>
>Inside https://site.com, i've a search form wich call solr with 
>http://xxx.xxx.xxx.xxx/solr.
>
>But the browser says : "mixt content" and blocks the call.
>
>So, i need to have something like https://xxx.xxx.xxx.xxx/solr
>
>Is it possible ?
>
>
>
>Le 07/03/2017 à 22:19, Alexandre Rafalovitch a écrit :
>> The first advise is NOT to expose your Solr directly to the public.
>> Anyone that can hit /search, can also hit /update and wipe out your
>> index.
>>
>> Unless you run a proper proxy that secures URLs and sanitizes the
>> parameters (in GET, in POST, escaped, etc).  And if you are doing
>> that, you can setup the HTTPS in your proxy and have it speak HTTP to
>> Solr on the backend.
>>
>> Otherwise, you need middleware, which runs on a server as well, so
>you
>> are back into configuring _that_ server (not Solr) for HTTPS.
>>
>> Regards,
>> Alex.
>> 
>> http://www.solr-start.com/ - Resources for Solr users, new and
>experienced
>>
>>
>> On 7 March 2017 at 15:45, pubdiverses  wrote:
>>> Hello,
>>>
>>> I would like to acces my solr instance with https://domain.com/solr.
>>>
>>> how to do this ?

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: https

2017-03-08 Thread Shawn Heisey
On 3/8/2017 12:11 PM, pubdiverses wrote:
> I have a site https://site.com under Apache.
> On the same physical server, i've installed solr.
>
> Inside https://site.com, i've a search form wich call solr with
> http://xxx.xxx.xxx.xxx/solr.
>
> But the browser says : "mixt content" and blocks the call.
>
> So, i need to have something like https://xxx.xxx.xxx.xxx/solr
>
> Is it possible ?

Alexandre and Phil already mentioned this, but it's so critical that I'm
going to repeat it.

This is a VERY BAD idea.

This exposes Solr directly to the Internet, which means that unless you
take *SPECIAL* effort with an intelligent proxy server, which is not at
all trivial to secure properly, then anybody on the Internet can change
your index, delete your index, or send denial of service queries that
make it so your legitimate users cannot utilize your search.

Whether HTTPS is used or not, do not expose Solr directly to the
Internet.  Instead, use server-side web application code to access it
behind the firewall.

Thanks,
Shawn



Re: Conditions for replication to copy full index

2017-03-08 Thread Shawn Heisey
On 3/6/2017 9:06 AM, Chris Ulicny wrote:
> We've recently had some issues with a 5.1.0 core copying the whole index
> when it was set to replicate from a master core.
>
> I've read that if there are documents that have been added to the slave
> core by mistake, it will do a full copy. Though we are still investigating,
> this is probably not the cause of it.
>
> Are there any other conditions in which the slave core will do a full copy
> of an index instead of only the necessary files?

There is this bug:

https://issues.apache.org/jira/browse/SOLR-9036

It lists SOLR-7134 as the cause.  The fix for the earlier issue was
released with 5.1, so your version would suffer from SOLR-9036.

The newer issue details sounds like there could be other things that
cause a full index replication in addition to a master restart.

The 5.1 version is nearly two years old.  Upgrading is advised.  You
could either go with 5.5.4 or 6.4.2.

Thanks,
Shawn



Re: Solr JDBC with Core (vs Collection)

2017-03-08 Thread Joel Bernstein
Getting streaming expression and SQL working in non-SolrCloud mode is my
top priority right now.

I'm testing the first parts of
https://issues.apache.org/jira/browse/SOLR-10200 today and will be
committing soon. The first functionality delivered will be the
significantTerms Streaming Expression. Here is a sample query:

expr=significantTerms(enron, q="from:tana.jo...@enron.com", field="to",
limit="20")=http://localhost:8983/solr/enron

Notice the enron.shards http param. This provides the shards for the
"enron" collection.

This will release as part of the first release of the significantTerms
expression in Solr 6.5.

Solr 6.6 will likely have support for all stream source and parallel
SQL/JDBC.



Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Mar 8, 2017 at 2:19 PM, OTH  wrote:

> Hello,
>
> Yes, I was trying to use it with a non-cloud setup.
>
> Basically, our application probably won't be requiring cloud features;
> however, it would be extremely helpful to use JDBC with Solr.
>
> Of course, we don't mind using SolrCloud if that's what is needed for JDBC.
>
> Are there any drawbacks to using SolrCloud, if a distributed setup probably
> won't be required?
>
> Much thanks
>
> On Thu, Mar 9, 2017 at 12:13 AM, Alexandre Rafalovitch  >
> wrote:
>
> > I believe JDBC requires streams, which requires SolrCloud, which
> > requires Collections (even if it is a single-core collection).
> >
> > Are you trying to use it with non-cloud setup?
> >
> > Regards,
> >Alex.
> > 
> > http://www.solr-start.com/ - Resources for Solr users, new and
> experienced
> >
> >
> > On 8 March 2017 at 14:02, OTH  wrote:
> > > Hello,
> > >
> > > From the examples I am seeing online and in the reference guide (
> > > https://cwiki.apache.org/confluence/display/solr/Solr+
> > JDBC+-+SQuirreL+SQL),
> > > I can only see Solr JDBC being used against a collection.  Is it
> possible
> > > however to use it with a core?  What should the JDBC URL be like in
> that
> > > case?
> > >
> > > Thanks
> >
>


Re: SOLR Atomic update of custom stored metadata clears full-text index! How to add metadata without losing full-text search

2017-03-08 Thread Erick Erickson
How are you updating? All the stored stuff is assuming "Atomic Updates"..



On Wed, Mar 8, 2017 at 11:15 AM, Alexandre Rafalovitch
 wrote:
> Uhm, actually, If you have copyField from multiple sources into that
> _text_ field, you may be accumulating/duplicating content on update.
>
> Check what happens to the content of that _text_ field when you do
> full-text and then do an attribute update.
>
> If I am right, you may want to have a separate "original_text" field
> that you store and then have your aggregate copyField destination not
> stored.
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 8 March 2017 at 13:41, Nicolas Bouillon
>  wrote:
>> Guys
>>
>> A BIG thank you, it works perfectly!!!
>>
>> After so much research I finally got my solution working.
>>
>> That was the trick, _text_ is stored and it’s working as expected.
>>
>> Have a very nice day and thanks a lot for your contribution.
>>
>> Really appreciated
>>
>> Nico
>>> On 8 Mar 2017, at 18:26, Nicolas Bouillon  
>>> wrote:
>>>
>>> Hi Erick, Shawn,
>>>
>>> Thx really a lot for your swift reaction, it’s fantastic.
>>> Let me answer both your answers:
>>>
>>> 1) the df entry in solrconfig.xml has not been changed:
>>>
>>> _text_
>>>
>>> 2)when I do a query for full-text search I don’t specify a field, I just 
>>> enter the string I’m looking for in the q parameter:
>>>
>>> Like this: I have a ppt containing the word “Microsoft”that is called 
>>> “Dynamics 365 Roadmap”, I do a query on “Microsoft”and it finds the document
>>> After update, it doesn’t find it unless I search for one of my custom 
>>> fields or something in the title like “Dynamics”
>>>
>>> So, my conclusion would be that you suggest I mark “_text_” as stored=true 
>>> in the schema, right?
>>> And reload core or even re-index.
>>>
>>> Thx a bunch
>>>
>>>
>>>
>>>
 On 8 Mar 2017, at 17:46, Erick Erickson  wrote:

 bq: I wonder if it won’t be simpler for me to write a custom handler

 Probably not, that would be Java too ;)...

 OK, back up a bit. You can change your schema such that the full-text
 field _is_ stored, I don't quite know what the default field is from
 memory, but you must be searching against it ;). It sounds like you're
 using the defaults and it's _probably_ _text_. And my guess is that
 you're searching on that field even though you don't specify, see the
 "df" entry in your solrconfig.xml file. There's no reason you can't
 change that to stored="true" (reindex of course).

 Nothing that you've mentioned so far looks like it should take
 anything except getting your configurations to be what you need, so
 don't make more work for yourself than you need to ;).

 After that, see the link Shawn provided...

 Best,
 Erick

 On Wed, Mar 8, 2017 at 8:22 AM, Nicolas Bouillon
  wrote:
> Hi Erick
>
> Thanks a lot for the elaborated answer. Let me give some precisions:
>
> 1. I upload the docs using an AJAX post multiform to my server.
> 2. The PHP target of the post, takes the file and stores it on disk
> 3. If the file is moved successfully from TEMP files to final 
> destination, I then call SOLR as follows:
>
> It’s a curl POST request:
>
> URL: http://my_server:8983/solr/my_core/update/extract/?; . $fields . 
> "=" . $id . "=*=true
> HEADER: Content-type: multipart/form-data
> POSTFIELDS: the entire file that has just been stored
> (BTW, it’s PHP specific but I send a CurlFile in an array as follows: 
> array('myfile' => $cfile)
>
> In the URL, the parameter $fields contains the following:
>
> $fields = "literal.kref=" . $id . "=" . $type . 
> "=" . $attachment;
>
> Where kref, ktype and kattachment are my custom fields (that I added to 
> the schema.xml previously)
>
> So, indeed it’s Tika that extracts the info. I didn’t change anything to 
> the ExtractHandler.
>
> I read about the fact that all fields must be marked as stored=true but:
>
> - I checked in the schema, all the fields that matter (Tika default 
> extracted fields) and my customer fields are stored=true.
> - I suppose that the full-text index is not stored in a field? And 
> therefore cannot be marked as stored?
>
> I manage to upload files and mark my docs with metadata but I have 
> existing files where I would like to update my fields (kref, …) without 
> re-extracting and I’d like also to allow for re-indexing if needed 
> without overriding my fields.
>
> I’m stuck… I wonder if it won’t be simpler for me to write a custom 
> handler of some sort but I don’t really program in Java.
>
> Cheers
>
> Nico
>

RE: https

2017-03-08 Thread Phil Scadden
What we are suggesting is that your browser does NOT access solr directly at 
all. In fact, configure firewall so that SOLR is unreachable outside the 
server. Instead you write a proxy in your site application which calls SOLR 
instead. Ie a server-to-server call instead of browser-to-server. This is a 
much more secure setup and allows you to "vet" query requests, potentially 
distribute to different cores on some application logic etc. Shouldn’t be hard 
to find a skeleton proxy code in whatever your site application is written in.

-Original Message-
From: pubdiverses [mailto:pubdiver...@free.fr]
Sent: Thursday, 9 March 2017 8:12 a.m.
To: solr-user@lucene.apache.org
Subject: Re: https

Hello,

I give you some more explanation.

I have a site https://site.com under Apache.
On the same physical server, i've installed solr.

Inside https://site.com, i've a search form wich call solr with 
http://xxx.xxx.xxx.xxx/solr.

But the browser says : "mixt content" and blocks the call.

So, i need to have something like https://xxx.xxx.xxx.xxx/solr

Is it possible ?



Le 07/03/2017 à 22:19, Alexandre Rafalovitch a écrit :
> The first advise is NOT to expose your Solr directly to the public.
> Anyone that can hit /search, can also hit /update and wipe out your
> index.
>
> Unless you run a proper proxy that secures URLs and sanitizes the
> parameters (in GET, in POST, escaped, etc).  And if you are doing
> that, you can setup the HTTPS in your proxy and have it speak HTTP to
> Solr on the backend.
>
> Otherwise, you need middleware, which runs on a server as well, so you
> are back into configuring _that_ server (not Solr) for HTTPS.
>
> Regards,
> Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and
> experienced
>
>
> On 7 March 2017 at 15:45, pubdiverses  wrote:
>> Hello,
>>
>> I would like to acces my solr instance with https://domain.com/solr.
>>
>> how to do this ?

Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


Re: Solr JDBC with Core (vs Collection)

2017-03-08 Thread OTH
Hello,

Yes, I was trying to use it with a non-cloud setup.

Basically, our application probably won't be requiring cloud features;
however, it would be extremely helpful to use JDBC with Solr.

Of course, we don't mind using SolrCloud if that's what is needed for JDBC.

Are there any drawbacks to using SolrCloud, if a distributed setup probably
won't be required?

Much thanks

On Thu, Mar 9, 2017 at 12:13 AM, Alexandre Rafalovitch 
wrote:

> I believe JDBC requires streams, which requires SolrCloud, which
> requires Collections (even if it is a single-core collection).
>
> Are you trying to use it with non-cloud setup?
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 8 March 2017 at 14:02, OTH  wrote:
> > Hello,
> >
> > From the examples I am seeing online and in the reference guide (
> > https://cwiki.apache.org/confluence/display/solr/Solr+
> JDBC+-+SQuirreL+SQL),
> > I can only see Solr JDBC being used against a collection.  Is it possible
> > however to use it with a core?  What should the JDBC URL be like in that
> > case?
> >
> > Thanks
>


Re: Solr JDBC with Core (vs Collection)

2017-03-08 Thread Dennis Gove
I don't have an answer to the original question, but I would like to point
out that work is being done to make streaming available outside of
SolrCloud under ticket https://issues.apache.org/jira/browse/SOLR-10200.

- Dennis

On Wed, Mar 8, 2017 at 2:13 PM, Alexandre Rafalovitch 
wrote:

> I believe JDBC requires streams, which requires SolrCloud, which
> requires Collections (even if it is a single-core collection).
>
> Are you trying to use it with non-cloud setup?
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 8 March 2017 at 14:02, OTH  wrote:
> > Hello,
> >
> > From the examples I am seeing online and in the reference guide (
> > https://cwiki.apache.org/confluence/display/solr/Solr+
> JDBC+-+SQuirreL+SQL),
> > I can only see Solr JDBC being used against a collection.  Is it possible
> > however to use it with a core?  What should the JDBC URL be like in that
> > case?
> >
> > Thanks
>


Re: SOLR Atomic update of custom stored metadata clears full-text index! How to add metadata without losing full-text search

2017-03-08 Thread Alexandre Rafalovitch
Uhm, actually, If you have copyField from multiple sources into that
_text_ field, you may be accumulating/duplicating content on update.

Check what happens to the content of that _text_ field when you do
full-text and then do an attribute update.

If I am right, you may want to have a separate "original_text" field
that you store and then have your aggregate copyField destination not
stored.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 8 March 2017 at 13:41, Nicolas Bouillon
 wrote:
> Guys
>
> A BIG thank you, it works perfectly!!!
>
> After so much research I finally got my solution working.
>
> That was the trick, _text_ is stored and it’s working as expected.
>
> Have a very nice day and thanks a lot for your contribution.
>
> Really appreciated
>
> Nico
>> On 8 Mar 2017, at 18:26, Nicolas Bouillon  
>> wrote:
>>
>> Hi Erick, Shawn,
>>
>> Thx really a lot for your swift reaction, it’s fantastic.
>> Let me answer both your answers:
>>
>> 1) the df entry in solrconfig.xml has not been changed:
>>
>> _text_
>>
>> 2)when I do a query for full-text search I don’t specify a field, I just 
>> enter the string I’m looking for in the q parameter:
>>
>> Like this: I have a ppt containing the word “Microsoft”that is called 
>> “Dynamics 365 Roadmap”, I do a query on “Microsoft”and it finds the document
>> After update, it doesn’t find it unless I search for one of my custom fields 
>> or something in the title like “Dynamics”
>>
>> So, my conclusion would be that you suggest I mark “_text_” as stored=true 
>> in the schema, right?
>> And reload core or even re-index.
>>
>> Thx a bunch
>>
>>
>>
>>
>>> On 8 Mar 2017, at 17:46, Erick Erickson  wrote:
>>>
>>> bq: I wonder if it won’t be simpler for me to write a custom handler
>>>
>>> Probably not, that would be Java too ;)...
>>>
>>> OK, back up a bit. You can change your schema such that the full-text
>>> field _is_ stored, I don't quite know what the default field is from
>>> memory, but you must be searching against it ;). It sounds like you're
>>> using the defaults and it's _probably_ _text_. And my guess is that
>>> you're searching on that field even though you don't specify, see the
>>> "df" entry in your solrconfig.xml file. There's no reason you can't
>>> change that to stored="true" (reindex of course).
>>>
>>> Nothing that you've mentioned so far looks like it should take
>>> anything except getting your configurations to be what you need, so
>>> don't make more work for yourself than you need to ;).
>>>
>>> After that, see the link Shawn provided...
>>>
>>> Best,
>>> Erick
>>>
>>> On Wed, Mar 8, 2017 at 8:22 AM, Nicolas Bouillon
>>>  wrote:
 Hi Erick

 Thanks a lot for the elaborated answer. Let me give some precisions:

 1. I upload the docs using an AJAX post multiform to my server.
 2. The PHP target of the post, takes the file and stores it on disk
 3. If the file is moved successfully from TEMP files to final destination, 
 I then call SOLR as follows:

 It’s a curl POST request:

 URL: http://my_server:8983/solr/my_core/update/extract/?; . $fields . 
 "=" . $id . "=*=true
 HEADER: Content-type: multipart/form-data
 POSTFIELDS: the entire file that has just been stored
 (BTW, it’s PHP specific but I send a CurlFile in an array as follows: 
 array('myfile' => $cfile)

 In the URL, the parameter $fields contains the following:

 $fields = "literal.kref=" . $id . "=" . $type . 
 "=" . $attachment;

 Where kref, ktype and kattachment are my custom fields (that I added to 
 the schema.xml previously)

 So, indeed it’s Tika that extracts the info. I didn’t change anything to 
 the ExtractHandler.

 I read about the fact that all fields must be marked as stored=true but:

 - I checked in the schema, all the fields that matter (Tika default 
 extracted fields) and my customer fields are stored=true.
 - I suppose that the full-text index is not stored in a field? And 
 therefore cannot be marked as stored?

 I manage to upload files and mark my docs with metadata but I have 
 existing files where I would like to update my fields (kref, …) without 
 re-extracting and I’d like also to allow for re-indexing if needed without 
 overriding my fields.

 I’m stuck… I wonder if it won’t be simpler for me to write a custom 
 handler of some sort but I don’t really program in Java.

 Cheers

 Nico

> On 8 Mar 2017, at 17:03, Erick Erickson  wrote:
>
> Nico:
>
> This is the place  for such questions! I'm not quite sure the source
> of the docs. When you say you "extract", does that mean you're using
> the ExtractingRequestHandler, i.e. uploading PDF or 

Re: Solr JDBC with Core (vs Collection)

2017-03-08 Thread Alexandre Rafalovitch
I believe JDBC requires streams, which requires SolrCloud, which
requires Collections (even if it is a single-core collection).

Are you trying to use it with non-cloud setup?

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 8 March 2017 at 14:02, OTH  wrote:
> Hello,
>
> From the examples I am seeing online and in the reference guide (
> https://cwiki.apache.org/confluence/display/solr/Solr+JDBC+-+SQuirreL+SQL),
> I can only see Solr JDBC being used against a collection.  Is it possible
> however to use it with a core?  What should the JDBC URL be like in that
> case?
>
> Thanks


Re: https

2017-03-08 Thread pubdiverses

Hello,

I give you some more explanation.

I have a site https://site.com under Apache.
On the same physical server, i've installed solr.

Inside https://site.com, i've a search form wich call solr with 
http://xxx.xxx.xxx.xxx/solr.


But the browser says : "mixt content" and blocks the call.

So, i need to have something like https://xxx.xxx.xxx.xxx/solr

Is it possible ?



Le 07/03/2017 à 22:19, Alexandre Rafalovitch a écrit :

The first advise is NOT to expose your Solr directly to the public.
Anyone that can hit /search, can also hit /update and wipe out your
index.

Unless you run a proper proxy that secures URLs and sanitizes the
parameters (in GET, in POST, escaped, etc).  And if you are doing
that, you can setup the HTTPS in your proxy and have it speak HTTP to
Solr on the backend.

Otherwise, you need middleware, which runs on a server as well, so you
are back into configuring _that_ server (not Solr) for HTTPS.

Regards,
Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 7 March 2017 at 15:45, pubdiverses  wrote:

Hello,

I would like to acces my solr instance with https://domain.com/solr.

how to do this ?




Solr JDBC with Core (vs Collection)

2017-03-08 Thread OTH
Hello,

>From the examples I am seeing online and in the reference guide (
https://cwiki.apache.org/confluence/display/solr/Solr+JDBC+-+SQuirreL+SQL),
I can only see Solr JDBC being used against a collection.  Is it possible
however to use it with a core?  What should the JDBC URL be like in that
case?

Thanks


Re: SOLR Atomic update of custom stored metadata clears full-text index! How to add metadata without losing full-text search

2017-03-08 Thread Nicolas Bouillon
Guys

A BIG thank you, it works perfectly!!!

After so much research I finally got my solution working.

That was the trick, _text_ is stored and it’s working as expected.

Have a very nice day and thanks a lot for your contribution.

Really appreciated

Nico
> On 8 Mar 2017, at 18:26, Nicolas Bouillon  
> wrote:
> 
> Hi Erick, Shawn,
> 
> Thx really a lot for your swift reaction, it’s fantastic.
> Let me answer both your answers:
> 
> 1) the df entry in solrconfig.xml has not been changed:
> 
> _text_
> 
> 2)when I do a query for full-text search I don’t specify a field, I just 
> enter the string I’m looking for in the q parameter:
> 
> Like this: I have a ppt containing the word “Microsoft”that is called 
> “Dynamics 365 Roadmap”, I do a query on “Microsoft”and it finds the document
> After update, it doesn’t find it unless I search for one of my custom fields 
> or something in the title like “Dynamics”
> 
> So, my conclusion would be that you suggest I mark “_text_” as stored=true in 
> the schema, right?
> And reload core or even re-index.
> 
> Thx a bunch
> 
> 
> 
> 
>> On 8 Mar 2017, at 17:46, Erick Erickson  wrote:
>> 
>> bq: I wonder if it won’t be simpler for me to write a custom handler
>> 
>> Probably not, that would be Java too ;)...
>> 
>> OK, back up a bit. You can change your schema such that the full-text
>> field _is_ stored, I don't quite know what the default field is from
>> memory, but you must be searching against it ;). It sounds like you're
>> using the defaults and it's _probably_ _text_. And my guess is that
>> you're searching on that field even though you don't specify, see the
>> "df" entry in your solrconfig.xml file. There's no reason you can't
>> change that to stored="true" (reindex of course).
>> 
>> Nothing that you've mentioned so far looks like it should take
>> anything except getting your configurations to be what you need, so
>> don't make more work for yourself than you need to ;).
>> 
>> After that, see the link Shawn provided...
>> 
>> Best,
>> Erick
>> 
>> On Wed, Mar 8, 2017 at 8:22 AM, Nicolas Bouillon
>>  wrote:
>>> Hi Erick
>>> 
>>> Thanks a lot for the elaborated answer. Let me give some precisions:
>>> 
>>> 1. I upload the docs using an AJAX post multiform to my server.
>>> 2. The PHP target of the post, takes the file and stores it on disk
>>> 3. If the file is moved successfully from TEMP files to final destination, 
>>> I then call SOLR as follows:
>>> 
>>> It’s a curl POST request:
>>> 
>>> URL: http://my_server:8983/solr/my_core/update/extract/?; . $fields . 
>>> "=" . $id . "=*=true
>>> HEADER: Content-type: multipart/form-data
>>> POSTFIELDS: the entire file that has just been stored
>>> (BTW, it’s PHP specific but I send a CurlFile in an array as follows: 
>>> array('myfile' => $cfile)
>>> 
>>> In the URL, the parameter $fields contains the following:
>>> 
>>> $fields = "literal.kref=" . $id . "=" . $type . 
>>> "=" . $attachment;
>>> 
>>> Where kref, ktype and kattachment are my custom fields (that I added to the 
>>> schema.xml previously)
>>> 
>>> So, indeed it’s Tika that extracts the info. I didn’t change anything to 
>>> the ExtractHandler.
>>> 
>>> I read about the fact that all fields must be marked as stored=true but:
>>> 
>>> - I checked in the schema, all the fields that matter (Tika default 
>>> extracted fields) and my customer fields are stored=true.
>>> - I suppose that the full-text index is not stored in a field? And 
>>> therefore cannot be marked as stored?
>>> 
>>> I manage to upload files and mark my docs with metadata but I have existing 
>>> files where I would like to update my fields (kref, …) without 
>>> re-extracting and I’d like also to allow for re-indexing if needed without 
>>> overriding my fields.
>>> 
>>> I’m stuck… I wonder if it won’t be simpler for me to write a custom handler 
>>> of some sort but I don’t really program in Java.
>>> 
>>> Cheers
>>> 
>>> Nico
>>> 
 On 8 Mar 2017, at 17:03, Erick Erickson  wrote:
 
 Nico:
 
 This is the place  for such questions! I'm not quite sure the source
 of the docs. When you say you "extract", does that mean you're using
 the ExtractingRequestHandler, i.e. uploading PDF or Word etc. to Solr
 and letting Tika parse it out? IOW, where is the fulltext coming from?
 
 For adding tags any time, Solr has "Atomic Updates" that has a couple
 of requirements, mainly you have to set stored="true" for all your
 fields _except_ the destinations for any  directives. Under
 the covers this pulls the stored data from Solr, overlays it with the
 new data you've sent and re-indexes it. The expense here is that your
 index will increase in size, but storing the data doesn't mean much of
 an increase in JVM requirements. That is, say your index doubles in
 size. Your JVM heap requirements may increase 5% (and, 

honouring terms.limit parameter in a distributed search for /terms api

2017-03-08 Thread radha krishnan
Hi,

in the TermsComponent.java's, createShardQuery, the motive to specify
terms.limit to -1 is clearly specified in java comment

but we have a usecase where have thousands  of terms and we want each core
to return only the value specfied by terms.limit.

can we have two flavours of TermsComponent where in
1. we specify terms.limit = -1 (which is the current one)
2. we honour terms.limit passed by the caller. something like below


public class LimitBasedTermsComponent extends TermsComponent {

@Override
public int distributedProcess(ResponseBuilder rb) throws IOException {
if (!rb.doTerms) {
return ResponseBuilder.STAGE_DONE;
}

if (rb.stage == ResponseBuilder.STAGE_EXECUTE_QUERY) {
TermsHelper th = rb._termsHelper;
if (th == null) {
th = rb._termsHelper = new TermsHelper();
th.init(rb.req.getParams());
}
ShardRequest sreq = createShardQuery(rb.req.getParams());
rb.addRequest(this, sreq);
}

if (rb.stage < ResponseBuilder.STAGE_EXECUTE_QUERY) {
return ResponseBuilder.STAGE_EXECUTE_QUERY;
} else {
return ResponseBuilder.STAGE_DONE;
}
}


private ShardRequest createShardQuery(SolrParams params) {
ShardRequest sreq = new ShardRequest();
sreq.purpose = ShardRequest.PURPOSE_GET_TERMS;

// base shard request on original parameters
sreq.params = new ModifiableSolrParams(params);

// if TermsParams.TERMS_LIMIT is present in the params, use
the value, else default it to 10
sreq.params.remove(TermsParams.TERMS_MAXCOUNT);
sreq.params.remove(TermsParams.TERMS_MINCOUNT);
sreq.params.set(TermsParams.TERMS_LIMIT,
params.getInt(TermsParams.TERMS_LIMIT, 10));
sreq.params.set(TermsParams.TERMS_SORT, TermsParams.TERMS_SORT_INDEX);

return sreq;
}



Thanks,
Radhakrishnan D


Re: SOLR Atomic update of custom stored metadata clears full-text index! How to add metadata without losing full-text search

2017-03-08 Thread Nicolas Bouillon
Hi Erick, Shawn,

Thx really a lot for your swift reaction, it’s fantastic.
Let me answer both your answers:

1) the df entry in solrconfig.xml has not been changed:

_text_

2)when I do a query for full-text search I don’t specify a field, I just enter 
the string I’m looking for in the q parameter:

Like this: I have a ppt containing the word “Microsoft”that is called “Dynamics 
365 Roadmap”, I do a query on “Microsoft”and it finds the document
After update, it doesn’t find it unless I search for one of my custom fields or 
something in the title like “Dynamics”

So, my conclusion would be that you suggest I mark “_text_” as stored=true in 
the schema, right?
And reload core or even re-index.

Thx a bunch




> On 8 Mar 2017, at 17:46, Erick Erickson  wrote:
> 
> bq: I wonder if it won’t be simpler for me to write a custom handler
> 
> Probably not, that would be Java too ;)...
> 
> OK, back up a bit. You can change your schema such that the full-text
> field _is_ stored, I don't quite know what the default field is from
> memory, but you must be searching against it ;). It sounds like you're
> using the defaults and it's _probably_ _text_. And my guess is that
> you're searching on that field even though you don't specify, see the
> "df" entry in your solrconfig.xml file. There's no reason you can't
> change that to stored="true" (reindex of course).
> 
> Nothing that you've mentioned so far looks like it should take
> anything except getting your configurations to be what you need, so
> don't make more work for yourself than you need to ;).
> 
> After that, see the link Shawn provided...
> 
> Best,
> Erick
> 
> On Wed, Mar 8, 2017 at 8:22 AM, Nicolas Bouillon
>  wrote:
>> Hi Erick
>> 
>> Thanks a lot for the elaborated answer. Let me give some precisions:
>> 
>> 1. I upload the docs using an AJAX post multiform to my server.
>> 2. The PHP target of the post, takes the file and stores it on disk
>> 3. If the file is moved successfully from TEMP files to final destination, I 
>> then call SOLR as follows:
>> 
>> It’s a curl POST request:
>> 
>> URL: http://my_server:8983/solr/my_core/update/extract/?; . $fields . 
>> "=" . $id . "=*=true
>> HEADER: Content-type: multipart/form-data
>> POSTFIELDS: the entire file that has just been stored
>> (BTW, it’s PHP specific but I send a CurlFile in an array as follows: 
>> array('myfile' => $cfile)
>> 
>> In the URL, the parameter $fields contains the following:
>> 
>> $fields = "literal.kref=" . $id . "=" . $type . 
>> "=" . $attachment;
>> 
>> Where kref, ktype and kattachment are my custom fields (that I added to the 
>> schema.xml previously)
>> 
>> So, indeed it’s Tika that extracts the info. I didn’t change anything to the 
>> ExtractHandler.
>> 
>> I read about the fact that all fields must be marked as stored=true but:
>> 
>> - I checked in the schema, all the fields that matter (Tika default 
>> extracted fields) and my customer fields are stored=true.
>> - I suppose that the full-text index is not stored in a field? And therefore 
>> cannot be marked as stored?
>> 
>> I manage to upload files and mark my docs with metadata but I have existing 
>> files where I would like to update my fields (kref, …) without re-extracting 
>> and I’d like also to allow for re-indexing if needed without overriding my 
>> fields.
>> 
>> I’m stuck… I wonder if it won’t be simpler for me to write a custom handler 
>> of some sort but I don’t really program in Java.
>> 
>> Cheers
>> 
>> Nico
>> 
>>> On 8 Mar 2017, at 17:03, Erick Erickson  wrote:
>>> 
>>> Nico:
>>> 
>>> This is the place  for such questions! I'm not quite sure the source
>>> of the docs. When you say you "extract", does that mean you're using
>>> the ExtractingRequestHandler, i.e. uploading PDF or Word etc. to Solr
>>> and letting Tika parse it out? IOW, where is the fulltext coming from?
>>> 
>>> For adding tags any time, Solr has "Atomic Updates" that has a couple
>>> of requirements, mainly you have to set stored="true" for all your
>>> fields _except_ the destinations for any  directives. Under
>>> the covers this pulls the stored data from Solr, overlays it with the
>>> new data you've sent and re-indexes it. The expense here is that your
>>> index will increase in size, but storing the data doesn't mean much of
>>> an increase in JVM requirements. That is, say your index doubles in
>>> size. Your JVM heap requirements may increase 5% (and, frankly I doubt
>>> that much, but I've never measured). FWIW, the on-disk size should
>>> increase by roughly 50% of the raw data size. WARNING: "raw data size"
>>> is the size _after_ extraction, so say you're indexing a 1K XML doc
>>> where the tags are taking up .75K. Then the on-disk memory should go
>>> up roughly .125K (50% of .25K)..
>>> 
>>> Don't worry about "thousands" of docs ;) On my laptop I index over 1K
>>> Wikipedia articles a second (YMMV of course). Without any 

Re: [ANNOUNCE] Apache Solr 6.4.2 released

2017-03-08 Thread Caruana, Matthew
Hi Shawn,

These are the facts:

With Solr 6.4.1, we started the optimisation of a 200gb index with 67 segments. 
This did not trigger replication. It took a few days. We confirmed that the 
bottleneck was the CPU (optimisation is not parallelised).

We manually triggered replication of the optimised index to another Solr 6.4.1 
instance, over a gigabit LAN. This took 45 hours before failing on the final 
file (the schema).

We upgraded both instances to 6.4.2 and started replication again. This took 
about 1.5 hours. Same index, same disks, same configuration, same network.

Matthew

> On 8 Mar 2017, at 5:25 pm, Shawn Heisey  wrote:
> 
>> On 3/8/2017 5:30 AM, Caruana, Matthew wrote:
>> After upgrading to 6.4.2 from 6.4.1, we’ve seen replication time for a
>> 200gb index decrease from 45 hours to 1.5 hours. 
> 
> Just to check how long it takes to move a large amount of data over a
> network, I started a copy of a 32GB directory over a 100Mb/s network
> using a Windows client and a Samba server.  It said it would take 50
> minutes.  At this rate, copying 200GB would take over five hours.  This
> is quite a bit longer than I expected, but I hadn't done the math to
> check transfer rate against size.
> 
> Assuming that you actually intended to use the word "replication" there
> (and not something like "rebuild"), this tells me that your network is
> considerably faster than 100 megabits per second, probably gigabit, and
> that the bottleneck is the speed of the disks.
> 
> I see a previous thread where you asked about optimization performance,
> so it sounds like you are optimizing the master index which causes a
> full replication to slaves.  This is one of the reasons that
> optimization is generally not recommended except on very small indexes
> or indexes that do not change very often.
> 
> Thanks,
> Shawn
> 


Re: SOLR Atomic update of custom stored metadata clears full-text index! How to add metadata without losing full-text search

2017-03-08 Thread Erick Erickson
bq: I wonder if it won’t be simpler for me to write a custom handler

Probably not, that would be Java too ;)...

OK, back up a bit. You can change your schema such that the full-text
field _is_ stored, I don't quite know what the default field is from
memory, but you must be searching against it ;). It sounds like you're
using the defaults and it's _probably_ _text_. And my guess is that
you're searching on that field even though you don't specify, see the
"df" entry in your solrconfig.xml file. There's no reason you can't
change that to stored="true" (reindex of course).

Nothing that you've mentioned so far looks like it should take
anything except getting your configurations to be what you need, so
don't make more work for yourself than you need to ;).

After that, see the link Shawn provided...

Best,
Erick

On Wed, Mar 8, 2017 at 8:22 AM, Nicolas Bouillon
 wrote:
> Hi Erick
>
> Thanks a lot for the elaborated answer. Let me give some precisions:
>
> 1. I upload the docs using an AJAX post multiform to my server.
> 2. The PHP target of the post, takes the file and stores it on disk
> 3. If the file is moved successfully from TEMP files to final destination, I 
> then call SOLR as follows:
>
> It’s a curl POST request:
>
> URL: http://my_server:8983/solr/my_core/update/extract/?; . $fields . 
> "=" . $id . "=*=true
> HEADER: Content-type: multipart/form-data
> POSTFIELDS: the entire file that has just been stored
> (BTW, it’s PHP specific but I send a CurlFile in an array as follows: 
> array('myfile' => $cfile)
>
> In the URL, the parameter $fields contains the following:
>
> $fields = "literal.kref=" . $id . "=" . $type . 
> "=" . $attachment;
>
> Where kref, ktype and kattachment are my custom fields (that I added to the 
> schema.xml previously)
>
> So, indeed it’s Tika that extracts the info. I didn’t change anything to the 
> ExtractHandler.
>
> I read about the fact that all fields must be marked as stored=true but:
>
> - I checked in the schema, all the fields that matter (Tika default extracted 
> fields) and my customer fields are stored=true.
> - I suppose that the full-text index is not stored in a field? And therefore 
> cannot be marked as stored?
>
> I manage to upload files and mark my docs with metadata but I have existing 
> files where I would like to update my fields (kref, …) without re-extracting 
> and I’d like also to allow for re-indexing if needed without overriding my 
> fields.
>
> I’m stuck… I wonder if it won’t be simpler for me to write a custom handler 
> of some sort but I don’t really program in Java.
>
> Cheers
>
> Nico
>
>> On 8 Mar 2017, at 17:03, Erick Erickson  wrote:
>>
>> Nico:
>>
>> This is the place  for such questions! I'm not quite sure the source
>> of the docs. When you say you "extract", does that mean you're using
>> the ExtractingRequestHandler, i.e. uploading PDF or Word etc. to Solr
>> and letting Tika parse it out? IOW, where is the fulltext coming from?
>>
>> For adding tags any time, Solr has "Atomic Updates" that has a couple
>> of requirements, mainly you have to set stored="true" for all your
>> fields _except_ the destinations for any  directives. Under
>> the covers this pulls the stored data from Solr, overlays it with the
>> new data you've sent and re-indexes it. The expense here is that your
>> index will increase in size, but storing the data doesn't mean much of
>> an increase in JVM requirements. That is, say your index doubles in
>> size. Your JVM heap requirements may increase 5% (and, frankly I doubt
>> that much, but I've never measured). FWIW, the on-disk size should
>> increase by roughly 50% of the raw data size. WARNING: "raw data size"
>> is the size _after_ extraction, so say you're indexing a 1K XML doc
>> where the tags are taking up .75K. Then the on-disk memory should go
>> up roughly .125K (50% of .25K)..
>>
>> Don't worry about "thousands" of docs ;) On my laptop I index over 1K
>> Wikipedia articles a second (YMMV of course). Without any particular
>> tuning. Without sharding. Very often the most expensive part of
>> indexing is acquiring the data in the first place, i.e. getting it
>> from a DB or extracting it from Tika. Solr will handle quite a load.
>>
>> And, if you're using the ExtractingRequestHandler, I'd seriously think
>> about moving it to a Client. Here's a Java example:
>> https://lucidworks.com/2012/02/14/indexing-with-solrj/
>>
>> Best,
>> Erick
>>
>> On Wed, Mar 8, 2017 at 7:46 AM, Nicolas Bouillon
>>  wrote:
>>> Dear SOLR friends,
>>>
>>> I developed a small ERP. I produce PDF documents linked to objects in my 
>>> ERP: invoices, timesheets, contracts, etc...
>>> I have also the possibility to attach documents to a particular object and 
>>> when I view an invoice for instance, I can see the attached documents.
>>>
>>> Until now, I was adding reference to these documents in my DB and store 
>>> docs on the server.

Re: [ANNOUNCE] Apache Solr 6.4.2 released

2017-03-08 Thread Walter Underwood
During the replication, check the disk, network, and CPU utilization. One of 
them is the bottleneck.

If the disk is at 100%, you are OK. If the network is at 100%, you are OK. If 
neither of them is at 100% and there is lots of CPU used (up to 100% of one 
core), then Solr is the bottleneck and it needs more performance work.

We are using New Relic for monitoring. That makes this sort of check very easy.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 8, 2017, at 8:24 AM, Shawn Heisey  wrote:
> 
> On 3/8/2017 5:30 AM, Caruana, Matthew wrote:
>> After upgrading to 6.4.2 from 6.4.1, we’ve seen replication time for a
>> 200gb index decrease from 45 hours to 1.5 hours. 
> 
> Just to check how long it takes to move a large amount of data over a
> network, I started a copy of a 32GB directory over a 100Mb/s network
> using a Windows client and a Samba server.  It said it would take 50
> minutes.  At this rate, copying 200GB would take over five hours.  This
> is quite a bit longer than I expected, but I hadn't done the math to
> check transfer rate against size.
> 
> Assuming that you actually intended to use the word "replication" there
> (and not something like "rebuild"), this tells me that your network is
> considerably faster than 100 megabits per second, probably gigabit, and
> that the bottleneck is the speed of the disks.
> 
> I see a previous thread where you asked about optimization performance,
> so it sounds like you are optimizing the master index which causes a
> full replication to slaves.  This is one of the reasons that
> optimization is generally not recommended except on very small indexes
> or indexes that do not change very often.
> 
> Thanks,
> Shawn
> 



Re: SOLR Atomic update of custom stored metadata clears full-text index! How to add metadata without losing full-text search

2017-03-08 Thread Shawn Heisey
On 3/8/2017 9:22 AM, Nicolas Bouillon wrote:
> - I checked in the schema, all the fields that matter (Tika default
> extracted fields) and my customer fields are stored=true. - I suppose
> that the full-text index is not stored in a field? And 

When you do a full text query, what field/fields are being searched?  If
these fields are a standard field (not a copyField destination) and not
stored, then all that data will be lost when you do an atomic update,
and then a document will no longer be found by a full text search after
it is updated.

The requirements for Atomic Update to avoid data loss are quite
explicit.  Look for the "Field Storage" section here:

https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-AtomicUpdates

Thanks,
Shawn



Re: [ANNOUNCE] Apache Solr 6.4.2 released

2017-03-08 Thread Shawn Heisey
On 3/8/2017 5:30 AM, Caruana, Matthew wrote:
> After upgrading to 6.4.2 from 6.4.1, we’ve seen replication time for a
> 200gb index decrease from 45 hours to 1.5 hours. 

Just to check how long it takes to move a large amount of data over a
network, I started a copy of a 32GB directory over a 100Mb/s network
using a Windows client and a Samba server.  It said it would take 50
minutes.  At this rate, copying 200GB would take over five hours.  This
is quite a bit longer than I expected, but I hadn't done the math to
check transfer rate against size.

Assuming that you actually intended to use the word "replication" there
(and not something like "rebuild"), this tells me that your network is
considerably faster than 100 megabits per second, probably gigabit, and
that the bottleneck is the speed of the disks.

I see a previous thread where you asked about optimization performance,
so it sounds like you are optimizing the master index which causes a
full replication to slaves.  This is one of the reasons that
optimization is generally not recommended except on very small indexes
or indexes that do not change very often.

Thanks,
Shawn



Re: SOLR Atomic update of custom stored metadata clears full-text index! How to add metadata without losing full-text search

2017-03-08 Thread Nicolas Bouillon
Hi Erick

Thanks a lot for the elaborated answer. Let me give some precisions:

1. I upload the docs using an AJAX post multiform to my server.
2. The PHP target of the post, takes the file and stores it on disk
3. If the file is moved successfully from TEMP files to final destination, I 
then call SOLR as follows:

It’s a curl POST request:

URL: http://my_server:8983/solr/my_core/update/extract/?; . $fields . 
"=" . $id . "=*=true
HEADER: Content-type: multipart/form-data
POSTFIELDS: the entire file that has just been stored
(BTW, it’s PHP specific but I send a CurlFile in an array as follows: 
array('myfile' => $cfile)

In the URL, the parameter $fields contains the following:

$fields = "literal.kref=" . $id . "=" . $type . 
"=" . $attachment;

Where kref, ktype and kattachment are my custom fields (that I added to the 
schema.xml previously)

So, indeed it’s Tika that extracts the info. I didn’t change anything to the 
ExtractHandler.

I read about the fact that all fields must be marked as stored=true but:

- I checked in the schema, all the fields that matter (Tika default extracted 
fields) and my customer fields are stored=true.
- I suppose that the full-text index is not stored in a field? And therefore 
cannot be marked as stored?

I manage to upload files and mark my docs with metadata but I have existing 
files where I would like to update my fields (kref, …) without re-extracting 
and I’d like also to allow for re-indexing if needed without overriding my 
fields.

I’m stuck… I wonder if it won’t be simpler for me to write a custom handler of 
some sort but I don’t really program in Java.

Cheers

Nico

> On 8 Mar 2017, at 17:03, Erick Erickson  wrote:
> 
> Nico:
> 
> This is the place  for such questions! I'm not quite sure the source
> of the docs. When you say you "extract", does that mean you're using
> the ExtractingRequestHandler, i.e. uploading PDF or Word etc. to Solr
> and letting Tika parse it out? IOW, where is the fulltext coming from?
> 
> For adding tags any time, Solr has "Atomic Updates" that has a couple
> of requirements, mainly you have to set stored="true" for all your
> fields _except_ the destinations for any  directives. Under
> the covers this pulls the stored data from Solr, overlays it with the
> new data you've sent and re-indexes it. The expense here is that your
> index will increase in size, but storing the data doesn't mean much of
> an increase in JVM requirements. That is, say your index doubles in
> size. Your JVM heap requirements may increase 5% (and, frankly I doubt
> that much, but I've never measured). FWIW, the on-disk size should
> increase by roughly 50% of the raw data size. WARNING: "raw data size"
> is the size _after_ extraction, so say you're indexing a 1K XML doc
> where the tags are taking up .75K. Then the on-disk memory should go
> up roughly .125K (50% of .25K)..
> 
> Don't worry about "thousands" of docs ;) On my laptop I index over 1K
> Wikipedia articles a second (YMMV of course). Without any particular
> tuning. Without sharding. Very often the most expensive part of
> indexing is acquiring the data in the first place, i.e. getting it
> from a DB or extracting it from Tika. Solr will handle quite a load.
> 
> And, if you're using the ExtractingRequestHandler, I'd seriously think
> about moving it to a Client. Here's a Java example:
> https://lucidworks.com/2012/02/14/indexing-with-solrj/
> 
> Best,
> Erick
> 
> On Wed, Mar 8, 2017 at 7:46 AM, Nicolas Bouillon
>  wrote:
>> Dear SOLR friends,
>> 
>> I developed a small ERP. I produce PDF documents linked to objects in my 
>> ERP: invoices, timesheets, contracts, etc...
>> I have also the possibility to attach documents to a particular object and 
>> when I view an invoice for instance, I can see the attached documents.
>> 
>> Until now, I was adding reference to these documents in my DB and store docs 
>> on the server.
>> Still, I found it cumbersome and not flexible enough, so I removed the table 
>> documents from my DB and decided to use SOLR to add metadata to the 
>> documents in the index.
>> 
>> Currently, I have the following custom fields:
>> - ktype (string): invoice, contract, etc…
>> - kattachment (int): 0 or 1
>> - kref (int): reference in DB of linked object, ex: 10 (for contract 10 in 
>> DB)
>> - ktags (strings, mutifield): free tags, ex: customerX, consulting, 
>> development
>> 
>> Each time I upload a document, I store in on server and then add it to SOLR 
>> using "extract" adding the metadata at the same time. It works fine.
>> 
>> I would like now 3 things:
>> 
>> - For existing documents that have not been extracted with metadata 
>> altogether at upload (documents uploaded before I developed the 
>> functionality), I'd like to update them with the proper metadata without 
>> losing the full-text search
>> - Be able to add anytime tags to the ktags field after upload whilst keeping 
>> full-text search
>> - In 

Re: SOLR Atomic update of custom stored metadata clears full-text index! How to add metadata without losing full-text search

2017-03-08 Thread Erick Erickson
Nico:

This is the place  for such questions! I'm not quite sure the source
of the docs. When you say you "extract", does that mean you're using
the ExtractingRequestHandler, i.e. uploading PDF or Word etc. to Solr
and letting Tika parse it out? IOW, where is the fulltext coming from?

For adding tags any time, Solr has "Atomic Updates" that has a couple
of requirements, mainly you have to set stored="true" for all your
fields _except_ the destinations for any  directives. Under
the covers this pulls the stored data from Solr, overlays it with the
new data you've sent and re-indexes it. The expense here is that your
index will increase in size, but storing the data doesn't mean much of
an increase in JVM requirements. That is, say your index doubles in
size. Your JVM heap requirements may increase 5% (and, frankly I doubt
that much, but I've never measured). FWIW, the on-disk size should
increase by roughly 50% of the raw data size. WARNING: "raw data size"
is the size _after_ extraction, so say you're indexing a 1K XML doc
where the tags are taking up .75K. Then the on-disk memory should go
up roughly .125K (50% of .25K)..

Don't worry about "thousands" of docs ;) On my laptop I index over 1K
Wikipedia articles a second (YMMV of course). Without any particular
tuning. Without sharding. Very often the most expensive part of
indexing is acquiring the data in the first place, i.e. getting it
from a DB or extracting it from Tika. Solr will handle quite a load.

And, if you're using the ExtractingRequestHandler, I'd seriously think
about moving it to a Client. Here's a Java example:
https://lucidworks.com/2012/02/14/indexing-with-solrj/

Best,
Erick

On Wed, Mar 8, 2017 at 7:46 AM, Nicolas Bouillon
 wrote:
> Dear SOLR friends,
>
> I developed a small ERP. I produce PDF documents linked to objects in my ERP: 
> invoices, timesheets, contracts, etc...
> I have also the possibility to attach documents to a particular object and 
> when I view an invoice for instance, I can see the attached documents.
>
> Until now, I was adding reference to these documents in my DB and store docs 
> on the server.
> Still, I found it cumbersome and not flexible enough, so I removed the table 
> documents from my DB and decided to use SOLR to add metadata to the documents 
> in the index.
>
> Currently, I have the following custom fields:
> - ktype (string): invoice, contract, etc…
> - kattachment (int): 0 or 1
> - kref (int): reference in DB of linked object, ex: 10 (for contract 10 in DB)
> - ktags (strings, mutifield): free tags, ex: customerX, consulting, 
> development
>
> Each time I upload a document, I store in on server and then add it to SOLR 
> using "extract" adding the metadata at the same time. It works fine.
>
> I would like now 3 things:
>
> - For existing documents that have not been extracted with metadata 
> altogether at upload (documents uploaded before I developed the 
> functionality), I'd like to update them with the proper metadata without 
> losing the full-text search
> - Be able to add anytime tags to the ktags field after upload whilst keeping 
> full-text search
> - In case I have to re-index, I want to be sure I don't have to restart 
> everything from scratch.
> In a few months, I expect to have thousands of docs in my 
> systemand then I'll add emails
>
> I have very little experience in SOLR. I know I can re-perform an extract 
> instead of an update when I modify a field but I'm pretty sure it's not the 
> right thing to do + performance problems can arise.
>
> What do you suggest me to do?
>
> I thought about storing the metadata linked to each document separately (in 
> DB or separate XML file individually or one XML for all) but I'm pretty sure 
> it will be very slow after a while.
>
> Thx a lot in advance fro your precious help.
> This is my first message to the user list, please excuse anything I may have 
> done wrong…I learn fast, don’t worry..
>
> Regards
>
> Nico
>
> My configuration:
>
> Synology 1511 running DSM 6.1
> Docker container for SOLR using latest stable version
> 1 core called “katalyst” containing index of all documents
>
> ERP is written in PHP/Mysql for backend and Jquery/Bootstrap for front-end
>
> I have a test env on OSX Sierra running docker, a prod environment on Synology
>
>


SOLR Atomic update of custom stored metadata clears full-text index! How to add metadata without losing full-text search

2017-03-08 Thread Nicolas Bouillon
Dear SOLR friends,

I developed a small ERP. I produce PDF documents linked to objects in my ERP: 
invoices, timesheets, contracts, etc...
I have also the possibility to attach documents to a particular object and when 
I view an invoice for instance, I can see the attached documents.

Until now, I was adding reference to these documents in my DB and store docs on 
the server. 
Still, I found it cumbersome and not flexible enough, so I removed the table 
documents from my DB and decided to use SOLR to add metadata to the documents 
in the index.

Currently, I have the following custom fields: 
- ktype (string): invoice, contract, etc… 
- kattachment (int): 0 or 1 
- kref (int): reference in DB of linked object, ex: 10 (for contract 10 in DB) 
- ktags (strings, mutifield): free tags, ex: customerX, consulting, development

Each time I upload a document, I store in on server and then add it to SOLR 
using "extract" adding the metadata at the same time. It works fine.

I would like now 3 things:

- For existing documents that have not been extracted with metadata altogether 
at upload (documents uploaded before I developed the functionality), I'd like 
to update them with the proper metadata without losing the full-text search
- Be able to add anytime tags to the ktags field after upload whilst keeping 
full-text search
- In case I have to re-index, I want to be sure I don't have to restart 
everything from scratch. 
In a few months, I expect to have thousands of docs in my systemand 
then I'll add emails

I have very little experience in SOLR. I know I can re-perform an extract 
instead of an update when I modify a field but I'm pretty sure it's not the 
right thing to do + performance problems can arise.

What do you suggest me to do?

I thought about storing the metadata linked to each document separately (in DB 
or separate XML file individually or one XML for all) but I'm pretty sure it 
will be very slow after a while.

Thx a lot in advance fro your precious help.
This is my first message to the user list, please excuse anything I may have 
done wrong…I learn fast, don’t worry..

Regards

Nico

My configuration:

Synology 1511 running DSM 6.1
Docker container for SOLR using latest stable version
1 core called “katalyst” containing index of all documents

ERP is written in PHP/Mysql for backend and Jquery/Bootstrap for front-end

I have a test env on OSX Sierra running docker, a prod environment on Synology




Re: [ANNOUNCE] Apache Solr 6.4.2 released

2017-03-08 Thread Erick Erickson
Caruana:

Thanks for that info.

Do you know offhand how that 1.5 hours compares to earlier versions?
I'm wondering if there is further work to be done here or are we back
to previous speeds.

Thanks
Erick

On Wed, Mar 8, 2017 at 4:30 AM, Caruana, Matthew  wrote:
> After upgrading to 6.4.2 from 6.4.1, we’ve seen replication time for a 200gb 
> index decrease from 45 hours to 1.5 hours.
>
>> On 7 Mar 2017, at 20:32, Ishan Chattopadhyaya  wrote:
>>
>> 7 March 2017, Apache Solr 6.4.2 available
>>
>> Solr is the popular, blazing fast, open source NoSQL search platform from
>> the Apache Lucene project. Its major features include powerful full-text
>> search, hit highlighting, faceted search and analytics, rich document
>> parsing, geospatial search, extensive REST APIs as well as parallel SQL.
>> Solr is enterprise grade, secure and highly scalable, providing fault
>> tolerant distributed search and indexing, and powers the search and
>> navigation features of many of the world's largest internet sites.
>>
>> Solr 6.4.2 is available for immediate download at:
>>
>>   -
>>
>>   http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
>>
>> Please read CHANGES.txt for a full list of new features and changes:
>>
>>   -
>>
>>   https://lucene.apache.org/solr/6_4_2/changes/Changes.html
>>
>> Solr 6.4.2 contains 4 bug fixes since the 6.4.1 release:
>>
>>   -
>>
>>   Serious performance degradation in Solr 6.4 due to the metrics
>>   collection. IndexWriter metrics collection turned off by default, directory
>>   level metrics collection completely removed (until a better design is
>>   found)
>>   -
>>
>>   Transaction log replay can hit an NullPointerException due to new
>>   Metrics code
>>   -
>>
>>   NullPointerException in CloudSolrClient when reading stale alias
>>   -
>>
>>   UnifiedHighlighter and PostingsHighlighter bug in PrefixQuery and
>>   TermRangeQuery for multi-byte text
>>
>> Further details of changes are available in the change log available at:
>> http://lucene.apache.org/solr/6_4_2/changes/Changes.html
>>
>> Please report any feedback to the mailing lists (http://lucene.apache.org/
>> solr/discussion.html)
>> Note: The Apache Software Foundation uses an extensive mirroring network
>> for distributing releases. It is possible that the mirror you are using may
>> not have replicated the release yet. If that is the case, please try
>> another mirror. This also applies to Maven access.
>


Re: DIH Full Index Issue

2017-03-08 Thread Alexandre Rafalovitch
Are you perhaps indexing at the same time from the source other than
DIH? Because the commit is global and all the changes from all the
sources will become visible.

Check the access logs perhaps to see the requests to /update handler or similar.

Regards,
   Alex.



http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 8 March 2017 at 09:27, AJ Lemke  wrote:
> Good Morning List!
>
> I have an issue where my DIH full index is committed after a minute of 
> indexing.
> My counts will fall from around 400K to 85K until the import is finished, 
> usually about four (4) minutes later.
>
> This is problematic for us as there are 315K missing items in our searches.
>
> Versioning Info:
> solr-spec - 6.3.0
> solr-impl - 6.3.0 a66a44513ee8191e25b477372094bfa846450316 - shalin - 
> 2016-11-02 19:52:42
> lucene-spec - 6.3.0
> lucene-impl - 6.3.0 a66a44513ee8191e25b477372094bfa846450316 - shalin - 
> 2016-11-02 19:47:11
>
> solrconfig.xml snippet
> 
> 
> 
> ${solr.ulog.dir:}
> 
> 
> -1
> false
> 
> 
> -1
> 
> 
>
>
> Any insights would be greatly appreciated.
> Let me know if more information is required.
>
> AJ


Re: LTR on multiple shards

2017-03-08 Thread Michael Nilsson
Hey Vincent,

The feature store and model store are both Solr Managed Resources.  To
propagate managed resources in distributed mode, including managed
stopwords and synonyms, you have to issue a collection reload command.  The
Solr reference guide of Managed Resources has a bit more on it in the
Applying Changes section.

https://cwiki.apache.org/confluence/display/solr/Managed+Resources
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RELOAD:ReloadaCollection

The Managed Resource page and LTR page should be updated to be more
explicit about it.

Hope that helps,
Michael



On Wed, Mar 8, 2017 at 5:01 AM, Vincent  wrote:

> Hi all,
>
> It seems that the curl commands from the LTR wiki (
> https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank) to
> post and/or delete features from and to the feature store only affect one
> shard instead of the entire collection. For example, when I run:
>
> |curl -XDELETE 'http://localhost:8983/solr/[C
> OLLECTION]/schema/feature-store/currentFeatureStore' <
> http://localhost:8983/solr/techproducts/schema/feature-stor
> e/currentFeatureStore%27>|
>
> the feature store still exists on one of my two shards. Same goes for the
> python HTTPConnection.request-function ("POST" and "DELETE").
>
> Is this a mistake on my end? I assume it's not supposed to work this way?
>
> Thanks a lot!
> Vincent
>


DIH Full Index Issue

2017-03-08 Thread AJ Lemke
Good Morning List!

I have an issue where my DIH full index is committed after a minute of indexing.
My counts will fall from around 400K to 85K until the import is finished, 
usually about four (4) minutes later.

This is problematic for us as there are 315K missing items in our searches.

Versioning Info:
solr-spec - 6.3.0
solr-impl - 6.3.0 a66a44513ee8191e25b477372094bfa846450316 - shalin - 
2016-11-02 19:52:42
lucene-spec - 6.3.0
lucene-impl - 6.3.0 a66a44513ee8191e25b477372094bfa846450316 - shalin - 
2016-11-02 19:47:11

solrconfig.xml snippet



${solr.ulog.dir:}


-1
false


-1




Any insights would be greatly appreciated.
Let me know if more information is required.

AJ


Re: Does {!child} query support nested Queries ("v=")

2017-03-08 Thread Mikhail Khludnev
Hello, Frank.

It's not clear what  field is. I guess that per shard {!child}
results might clash by id during merge. Can you make sure that per child
ids are unique across all shards?

On Mon, Mar 6, 2017 at 10:47 PM, Kelly, Frank  wrote:

> Hi Mikhail,
>   Sorry I didn’t reply sooner
>
> Here are some example docs - each document for a userAccount object has 1
> or more nested documents for our userLinkedAccount object
>
> SolrInputDocument(fields: [type=userAccount,
> typeId=userAccount/HERE-8ce41333-7c08-40d3-9b2c-REDACTED,
> id=userAccount/HERE-8ce41333-7c08-40d3-9b2c-REDACTED,
> emailAddress=[redac...@here.com, REDACTED here.com], nameSort=�,
> emailType=Primary, familyName=REDACTED, allText=[REDACTED, REDACTED ,
> untokenized=[REDACTED, REDACTED , isEnabled=1,
> createdTimeNumeric=1406972278682,
> haAccountId=HERE-8ce41333-7c08-40d3-9b2c-REDACTED, givenName=REDACTED,
> readAccess=application, indexTime=1488828050933])
> SolrInputDocument(fields: [type=userLinkedAccount,
> typeId=userLinkedAccount/5926990ea0708fa82c9ddca5d1bda6ed3331a450,
> id=userLinkedAccount/5926990ea0708fa82c9ddca5d1bda6ed3331a450,
> haAccountId=HERE-8ce41333-7c08-40d3-9b2c-REDACTED, nameSort=�,
> hereRealm=HERE, haAccountType=password, haUserId= redac...@here.com,
> readAccess=application, createdTimeNumeric=1406972278646,
> indexTime=1488828050933])
>
> SolrInputDocument(fields: [type=userAccount,
> typeId=userAccount/HERE-4797487f-7659-4c58-80b5-REDACTED,
> id=userAccount/HERE-4797487f-7659-4c58-80b5-REDACTED,
> emailAddress=[redac...@live.de, redac...@live.de], nameSort=�,
> emailType=Primary, familyName= REDACTED, allText=[REDACTED, REDACTED],
> untokenized=[REDACTED, REDACTED], isEnabled=1,
> createdTimeNumeric=1447141199050,
> haAccountId=HERE-4797487f-7659-4c58-80b5-REDACTED, givenName=Krzysztof,
> readAccess=application, indexTime=1488828050941])
> SolrInputDocument(fields: [type=userLinkedAccount,
> typeId=userLinkedAccount/02d11e8096dc4727ee7c2c4f6cc4723190620088,
> id=userLinkedAccount/02d11e8096dc4727ee7c2c4f6cc4723190620088,
> haAccountId=HERE-4797487f-7659-4c58-80b5-REDACTED, nameSort=�,
> hereRealm=HERE, haAccountType=password, haUserId=redac...@live.de,
> readAccess=application, createdTimeNumeric=1447141199009,
> indexTime=1488828050941])
>
> SolrInputDocument(fields: [type=userAccount,
> typeId=userAccount/HERE-8ce41333-7c08-40d3-9b2c-REDACTED,
> id=userAccount/HERE-8ce41333-7c08-40d3-9b2c-REDACTED,
> emailAddress=[redac...@here.com, REDACTED here.com], nameSort=�,
> emailType=Primary, familyName= REDACTED, allText=[REDACTED, REDACTED],
> untokenized=[REDACTED, REDACTED], isEnabled=1,
> createdTimeNumeric=1406972278682,
> haAccountId=HERE-8ce41333-7c08-40d3-9b2c-REDACTED, givenName= REDACTED,
> readAccess=application, indexTime=1488828051697])
> SolrInputDocument(fields: [type=userLinkedAccount,
> typeId=userLinkedAccount/5926990ea0708fa82c9ddca5d1bda6ed3331a450,
> id=userLinkedAccount/5926990ea0708fa82c9ddca5d1bda6ed3331a450,
> haAccountId=HERE-8ce41333-7c08-40d3-9b2c-REDACTED, nameSort=�,
> hereRealm=HERE, haAccountType=password, haUserId= redac...@here.com,
> readAccess=application, createdTimeNumeric=1406972278646,
> indexTime=1488828051697])
>
>
> So we often want to
> FIND userLinkedAccount document WHERE parentDocument has some filter
> properties e.g. Name / email address
> E.g.
>
> +type:userLinkedAccount +{!child of="type:userAccount"
> v="givenName:frank*”}
>
> The results appear to come back fine but the numFound often has a small
> delta we cannot explain
>
> Here is the output of the debugQuery
>
> "rawquerystring": "+type:userLinkedAccount +{!child
> of=\"type:userAccount\" v=\"givenName:frank*\"}",
> "querystring": "+type:userLinkedAccount +{!child
> of=\"type:userAccount\" v=\"givenName:frank*\"}",
> "parsedquery": "+type:userLinkedAccount
> +ToChildBlockJoinQuery(ToChildBlockJoinQuery (givenName:frank*))",
> "parsedquery_toString": "+type:userLinkedAccount
> +ToChildBlockJoinQuery (givenName:frank*)",
> "QParser": "LuceneQParser",
> "explain": {
>   "userLinkedAccount/eb86bc13944094ce16f684a7f58e2294c84ca956":
> "\n1.9348345 = sum of:\n  1.4179944 = weight(type:userLinkedAccount in
> 84623) [DefaultSimilarity], result of:\n1.4179944 =
> score(doc=84623,freq=1.0), product of:\n  0.85608196 = queryWeight,
> product of:\n1.6563768 = idf(docFreq=14190942, maxDocs=27357228)\n
>0.5168401 = queryNorm\n  1.6563768 = fieldWeight in 84623,
> product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
> termFreq=1.0\n1.6563768 = idf(docFreq=14190942,
> maxDocs=27357228)\n1.0 = fieldNorm(doc=84623)\n  0.5168401 = Score
> based on parent document 84624\n0.5168401 = givenName:frank*, product
> of:\n  1.0 = boost\n  0.5168401 = queryNorm\n",
>   "userLinkedAccount/78498d9d7d5c1a52de0f61d90df138ac7381d37f":
> "\n1.9348345 = sum of:\n  1.4179944 = weight(type:userLinkedAccount in
> 113884) 

Re: DrillSideWaysSearch on faceting

2017-03-08 Thread Chitra
Hi,
Thank you so much.

On Wed, Mar 8, 2017 at 1:58 PM, Mikhail Khludnev  wrote:

> Hello, Chitra.
>
> Check this http://yonik.com/multi-select-faceting/ and
> https://wiki.apache.org/solr/SimpleFacetParameters#Multi-
> Select_Faceting_and_LocalParams
>
>
> On Wed, Mar 8, 2017 at 7:09 AM, Chitra  wrote:
>
> > Hi,
> >   I am a new one to Solr. Recently we are digging drill sideways
> search
> > (for faceting purpose) on Lucene. Is that solr facets support drill
> > sideways search like Lucene?? If yes, Kindly suggest the API or article
> how
> > to use.
> >
> >
> > Any help is much appreciated.
> >
> >
> > Thanks,
> > Chitra
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: [ANNOUNCE] Apache Solr 6.4.2 released

2017-03-08 Thread Caruana, Matthew
After upgrading to 6.4.2 from 6.4.1, we’ve seen replication time for a 200gb 
index decrease from 45 hours to 1.5 hours.

> On 7 Mar 2017, at 20:32, Ishan Chattopadhyaya  wrote:
> 
> 7 March 2017, Apache Solr 6.4.2 available
> 
> Solr is the popular, blazing fast, open source NoSQL search platform from
> the Apache Lucene project. Its major features include powerful full-text
> search, hit highlighting, faceted search and analytics, rich document
> parsing, geospatial search, extensive REST APIs as well as parallel SQL.
> Solr is enterprise grade, secure and highly scalable, providing fault
> tolerant distributed search and indexing, and powers the search and
> navigation features of many of the world's largest internet sites.
> 
> Solr 6.4.2 is available for immediate download at:
> 
>   -
> 
>   http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
> 
> Please read CHANGES.txt for a full list of new features and changes:
> 
>   -
> 
>   https://lucene.apache.org/solr/6_4_2/changes/Changes.html
> 
> Solr 6.4.2 contains 4 bug fixes since the 6.4.1 release:
> 
>   -
> 
>   Serious performance degradation in Solr 6.4 due to the metrics
>   collection. IndexWriter metrics collection turned off by default, directory
>   level metrics collection completely removed (until a better design is
>   found)
>   -
> 
>   Transaction log replay can hit an NullPointerException due to new
>   Metrics code
>   -
> 
>   NullPointerException in CloudSolrClient when reading stale alias
>   -
> 
>   UnifiedHighlighter and PostingsHighlighter bug in PrefixQuery and
>   TermRangeQuery for multi-byte text
> 
> Further details of changes are available in the change log available at:
> http://lucene.apache.org/solr/6_4_2/changes/Changes.html
> 
> Please report any feedback to the mailing lists (http://lucene.apache.org/
> solr/discussion.html)
> Note: The Apache Software Foundation uses an extensive mirroring network
> for distributing releases. It is possible that the mirror you are using may
> not have replicated the release yet. If that is the case, please try
> another mirror. This also applies to Maven access.



Re: [ANNOUNCE] Apache Solr 6.4.2 released

2017-03-08 Thread Bernd Fehling
Shouldn't in server/solr/configsets/.../solrconfig.xml
6.4.1
really read
6.4.2

May be something for package builder for future releases?

Regards
Bernd

Am 07.03.2017 um 20:32 schrieb Ishan Chattopadhyaya:
> 7 March 2017, Apache Solr 6.4.2 available
> 
> Solr is the popular, blazing fast, open source NoSQL search platform from
> the Apache Lucene project. Its major features include powerful full-text
> search, hit highlighting, faceted search and analytics, rich document
> parsing, geospatial search, extensive REST APIs as well as parallel SQL.
> Solr is enterprise grade, secure and highly scalable, providing fault
> tolerant distributed search and indexing, and powers the search and
> navigation features of many of the world's largest internet sites.
> 
> Solr 6.4.2 is available for immediate download at:
> 
>-
> 
>http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
> 
> Please read CHANGES.txt for a full list of new features and changes:
> 
>-
> 
>https://lucene.apache.org/solr/6_4_2/changes/Changes.html
> 
> Solr 6.4.2 contains 4 bug fixes since the 6.4.1 release:
> 
>-
> 
>Serious performance degradation in Solr 6.4 due to the metrics
>collection. IndexWriter metrics collection turned off by default, directory
>level metrics collection completely removed (until a better design is
>found)
>-
> 
>Transaction log replay can hit an NullPointerException due to new
>Metrics code
>-
> 
>NullPointerException in CloudSolrClient when reading stale alias
>-
> 
>UnifiedHighlighter and PostingsHighlighter bug in PrefixQuery and
>TermRangeQuery for multi-byte text
> 
> Further details of changes are available in the change log available at:
> http://lucene.apache.org/solr/6_4_2/changes/Changes.html
> 
> Please report any feedback to the mailing lists (http://lucene.apache.org/
> solr/discussion.html)
> Note: The Apache Software Foundation uses an extensive mirroring network
> for distributing releases. It is possible that the mirror you are using may
> not have replicated the release yet. If that is the case, please try
> another mirror. This also applies to Maven access.
> 

-- 
*
Bernd FehlingBielefeld University Library
Dipl.-Inform. (FH)LibTec - Library Technology
Universitätsstr. 25  and Knowledge Management
33615 Bielefeld
Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


LTR on multiple shards

2017-03-08 Thread Vincent

Hi all,

It seems that the curl commands from the LTR wiki 
(https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank) to 
post and/or delete features from and to the feature store only affect 
one shard instead of the entire collection. For example, when I run:


|curl -XDELETE 
'http://localhost:8983/solr/[COLLECTION]/schema/feature-store/currentFeatureStore' 
|


the feature store still exists on one of my two shards. Same goes for 
the python HTTPConnection.request-function ("POST" and "DELETE").


Is this a mistake on my end? I assume it's not supposed to work this way?

Thanks a lot!
Vincent


Re: DrillSideWaysSearch on faceting

2017-03-08 Thread Mikhail Khludnev
Hello, Chitra.

Check this http://yonik.com/multi-select-faceting/ and
https://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams


On Wed, Mar 8, 2017 at 7:09 AM, Chitra  wrote:

> Hi,
>   I am a new one to Solr. Recently we are digging drill sideways search
> (for faceting purpose) on Lucene. Is that solr facets support drill
> sideways search like Lucene?? If yes, Kindly suggest the API or article how
> to use.
>
>
> Any help is much appreciated.
>
>
> Thanks,
> Chitra
>



-- 
Sincerely yours
Mikhail Khludnev