Re: SOLR indexing takes longer time

2020-08-17 Thread Aroop Ganguly
Adding on to what others have said, indexing speed in general is largely affected by the parallelism and isolation you can give to each node. Is there a reason why you cannot have more than 1 shard? If you have 5 node cluster, why not have 5 shards, maxshardspernode=1 replica=1 is ok. You should

Re: Solr ping taking 600 seconds

2020-08-17 Thread Susheel Kumar
yes, Alex. This is reproducible. Will check if we can run Wireshark. Thank you. On Mon, Aug 17, 2020 at 8:11 PM Alexandre Rafalovitch wrote: > If this is reproducible, I would run Wireshark on the network and see what > happens at packet level. > > Leaning towards firewall timing out and just

Re: SOLR indexing takes longer time

2020-08-17 Thread Shawn Heisey
On 8/17/2020 12:22 PM, Abhijit Pawar wrote: We are indexing some 200K plus documents in SOLR 5.4.1 with no shards / replicas and just single core. It takes almost 3.5 hours to index that data. I am using a data import handler to import data from the mongo database. Is there something we can do

Re: Solr ping taking 600 seconds

2020-08-17 Thread Susheel Kumar
Thanks for the all responses. Shawn - to your point both ping or select in between taking 600+ seconds to return as you can see below 1st ping attempt was all good and 2nd took long time. Similarly for select couple of select all returned fine and then suddenly taking long time. I'll try to run

Re: Manipulating client's query using a Query object

2020-08-17 Thread Edward Turner
Hi Markus, That's really great info. Thank you. Supposing we've now modified the Query object, do you know how we would get the corresponding query String, which we could then forward to our Solrcloud via SolrClient? (Or should we be using this extended ExtendedDisMaxQParser class server side

Re: Manipulating client's query using a Query object

2020-08-17 Thread Erick Erickson
Ed: Right, doing this in a custom query parser on the Solr end that subclasses edismax is probably the way to go or similar. Especially because String->parsed query->String, even without any changes in the parsing is _not_ guaranteed to give you the same string back. I’m not clear on whether

Re: SOLR indexing takes longer time

2020-08-17 Thread Abhijit Pawar
Sure Divye, *Here's the config.* *conf/solr-config.xml:* /home/ec2-user/solr/solr-5.4.1/server/solr/test_core/conf/dataimport/data-source-config.xml *schema.xml:* has of all the field definitions *conf/dataimport/data-source-config.xml* . . . 4-5 more nested

Re: Elevation with distributed search causes NPE

2020-08-17 Thread Erick Erickson
I "enabled patch review”… Thanks! > On Aug 17, 2020, at 9:43 AM, smk wrote: > > H Erick, > > I am a colleague of Marc and have attached a patch to his ticket > https://issues.apache.org/jira/browse/SOLR-14662 > Can you set the ticket to "Patch Available"? For me there is no such option. > >

Re: SOLR indexing takes longer time

2020-08-17 Thread Jörn Franke
The DIH is single threaded and deprecated. Your best bet is to have a script/program extracting data from MongoDB and write them to Solr in Batches using multiple threads. You will see a significant higher performance for your data. > Am 17.08.2020 um 20:23 schrieb Abhijit Pawar : > > Hello,

Re: SOLR indexing takes longer time

2020-08-17 Thread Walter Underwood
I’m seeing multiple red flags for performance here. The top ones are “DIH”, “MongoDB”, and “SQL on MongoDB”. MongoDB is not a relational database. Our multi-threaded extractor using the Mongo API was still three times slower than the same approach on MySQL. Check the CPU usage on the Solr hosts

RE: Manipulating client's query using a Query object

2020-08-17 Thread Markus Jelsma
Hello Edward, You asked for the 'Lucene Query representation of the client's query' which is already inside Solr and needs no forwarding to anything. Just return in parse() and you are good to go. The Query object contains the analyzed form of your query string. ExtendedDismax has some

Re: Manipulating client's query using a Query object

2020-08-17 Thread Edward Turner
Hi Markus, Many thanks, I see what you are saying. My question was: Question: is it possible to get a Lucene Query representation of the client's query, which we can then navigate and manipulate -- before we then send the String representation of this Query to Solr for evaluation? ... and from

Re: SOLR indexing takes longer time

2020-08-17 Thread Divye Handa
Can you share the dih configuration you are using for same? On Mon, 17 Aug, 2020, 23:52 Abhijit Pawar, wrote: > Hello, > > We are indexing some 200K plus documents in SOLR 5.4.1 with no shards / > replicas and just single core. > It takes almost 3.5 hours to index that data. > I am using a data

Re: Solr ping taking 600 seconds

2020-08-17 Thread Alexandre Rafalovitch
If this is reproducible, I would run Wireshark on the network and see what happens at packet level. Leaning towards firewall timing out and just starting to drop all packets. Regards, Alex On Mon., Aug. 17, 2020, 6:22 p.m. Susheel Kumar, wrote: > Thanks for the all responses. > > Shawn -

Looking for Solr contractor at Chegg

2020-08-17 Thread Walter Underwood
We plan to upgrade all of our custers to Solr 8.x and are looking for a contractor. The Solr Cloud clusters are on 6.6.2 and we have a master/slave cluster on 4.10.4 with a customized edismax query parser (eedismax?).

Re: IOException occured when talking to server

2020-08-17 Thread Dominique Bejean
These links are not providing solutions but may be provide some ideas for the investigation. I suggest to try the -Djavax.net.debug=all JVM parameter for your client application. Good luke. Dominique Le lun. 17 août 2020 à 19:11, Odysci a écrit : > Dominique, > thanks, but I'm not sure the

Re: Slow query response from SOLR 5.4.1

2020-08-17 Thread Abhijit Pawar
Jason, Not yet.This issue was on the back burner for a few daysHowever we still need to figure out what could be a potential solution to it. The setup is basic one - with one node / no shards or replicas 2 cores When I run the query adding debug=timing to raw query parameters it just hangs

SOLR indexing takes longer time

2020-08-17 Thread Abhijit Pawar
Hello, We are indexing some 200K plus documents in SOLR 5.4.1 with no shards / replicas and just single core. It takes almost 3.5 hours to index that data. I am using a data import handler to import data from the mongo database. Is there something we can do to reduce the time taken to index?

Manipulating client's query using a Query object

2020-08-17 Thread Edward Turner
Hi all, Thanks for all your help recently. We're now using the edismax query parser and are happy with its behaviour. We have another question which maybe someone can help with. We have one use case where we optimise our query before sending it to Solr, and we do this by manipulating the

RE: Manipulating client's query using a Query object

2020-08-17 Thread Markus Jelsma
Hello Edward, Yes you can by extending ExtendedDismaxQParser [1] and override its parse() method. You get the main Query object through super.parse(). If you need even more fine grained control on how Query objects are created you can extend ExtendedSolrQueryParser's [2] (inner class)

Re: IOException occured when talking to server

2020-08-17 Thread Dominique Bejean
Hi, Can you provide more information ? - Solr and ZK version - full error stacktrace generated by SolrJ - any concomitant and relevant information in solr nodes logs or zk logs Just to know, why not use a load balanced LBHttp... Solr Client ? Regards. Dominique Le lun. 17 août 2020 à 00:41,

Solr with HDFS

2020-08-17 Thread Prashant Jyoti
Hi, I am trying to get Solr running with HDFS but getting the attached exception in logs when trying to create a collection. I have attached the relevant portions of solrconfig.xml and solr.in.cmd that I have modified. Could anybody point me in the right direction? What might I be doing wrong? Any

Re: IOException occured when talking to server

2020-08-17 Thread Dominique Bejean
If you want a more detailed debug information from your client application, you can add this parameter while starting Solr JVM. -Djavax.net.debug=all It is very verbose ! Dominique Le lun. 17 août 2020 à 17:59, Dominique Bejean a écrit : > Hi, > > It looks like this issues >

Re: IOException occured when talking to server

2020-08-17 Thread Dominique Bejean
I mean add this parameter on your client application JVM :) Le lun. 17 août 2020 à 18:36, Dominique Bejean a écrit : > If you want a more detailed debug information from your client > application, you can add this parameter while starting Solr JVM. > -Djavax.net.debug=all > > It is very

Re: IOException occured when talking to server

2020-08-17 Thread Odysci
Dominique, thanks, but I'm not sure the links you sent point to an actual solution. The Nginx logs, sometimes give a 499 return code which is: (499 Client Closed Request Used when the client has closed the request before the server could send a response. but the timestamps of these log msgs do

Re: Elevation with distributed search causes NPE

2020-08-17 Thread smk
H Erick, I am a colleague of Marc and have attached a patch to his ticket https://issues.apache.org/jira/browse/SOLR-14662 Can you set the ticket to "Patch Available"? For me there is no such option. Thank you very much, Thomas -- Sent from:

Re: IOException occured when talking to server

2020-08-17 Thread Dominique Bejean
Hi, It looks like this issues https://github.com/eclipse/jetty.project/issues/4883 https://github.com/eclipse/jetty.project/issues/2571 The Nginx server closed the connection. Any info in nginx log ? Dominique Le lun. 17 août 2020 à 17:33, Odysci a écrit : > Hi, > thanks for the reply. >

Re: IOException occured when talking to server

2020-08-17 Thread Odysci
Hi, thanks for the reply. We're using solr 8.3.1, ZK 3.5.6 The stacktrace is below. The address on the first line "http://192.168.15.10:888/solr/mycollection; is the "server" address in my nginx configuration, which points to 2 upstream solr nodes. There were no other solr or ZK messages in the