RE: Windows Service

2016-03-03 Thread Cannon, Steven
Thanks Guilherme, That took about two seconds to get working :) However, that solution doesn't really work for me as it means we need to also install the NSSM into a Client environment - which could be a problem. Does anyone have a .cmd file they could share for installing as a Windows

Re: What is the best way to index 15 million documents of total size 425 GB?

2016-03-03 Thread Toke Eskildsen
On Fri, 2016-03-04 at 12:41 +0530, Aneesh Mon N wrote: >- is there any difference in posting the data in json format vs xml? >- do we get any performance improvement if we generate the json/xml >files, scp to the solr server and then push via curl command I have not tested that, but

Re: What is the best way to index 15 million documents of total size 425 GB?

2016-03-03 Thread Aneesh Mon N
Hi Jack Thanks for the response. I have 106 fields in a document and for 20 of them are integer, 30 character varying and rest all are text fields. We don't have any blob data. To add on to this, 40% of the documents are of smaller in size as it has less or no content in text field values(we

Solr /Lucene Payload loading

2016-03-03 Thread KNitin
Hi, I am indexing a bunch of payloads with terms in solr. I notice during query time that the IO reads increase a lot everytime i require the payload to be fetched. Does solr load payload from the disk all the time? Is there anyway to force it to be loaded into mem? Thanks, Nitin

How to use geospatial search to find the locations within polygon

2016-03-03 Thread Pradeepchandra Mulpuru
Hi Sir, I have a question on Apache Solr Spatial search. I have a json type data of City, Latitude & Longitude. I indexed those fields with locm_place of the type location_rpt. Now I want to give a polygon as a filter query in order to get the City names located in that polygon. I don't have any

Re: Commit after every document - alternate approach

2016-03-03 Thread sangs8788
When a commit fails, the document doesnt get cleared out from MQ and there is a task which runs in a background to republish the files to SOLR. If we do a batch commit we will not know we will end up redoing the same batch commit again. We currenlty have a client side commit which issue the

Re: Commit after every document - alternate approach

2016-03-03 Thread Walter Underwood
So batch them. You get a response back from Solr whether the document was accepted. If that fail, there is a failure. What do you do then? After every 100 docs or one minute, do a commit. Then delete the documents from the input queue. What do you do when the commit fails? wunder Walter

Re: Commit after every document - alternate approach

2016-03-03 Thread Walter Underwood
If you need transactions, you should use a different system, like MarkLogic. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 3, 2016, at 8:46 PM, sangs8788 > wrote: > > Hi Emir, > > Right now we are having

Deciding on Solr Nodes and Configuration

2016-03-03 Thread sangs8788
There will be 16 MQs which will be send documents to SOLR Servers. Below is our expectation, Expected writes per month - 50 Million (inserts only) Size of each document - 10 KB to 70KB Expected reads per month - 10 per month In terms of highest hourly rate Reads - 2000/hour In terms of

Re: Commit after every document - alternate approach

2016-03-03 Thread sangs8788
Hi Varun, We dont have SOLR Cloud setup in our system. We have Master-Slave architecture setup. In that case i dont see a way where SOLR can guarantee whether a document got indexed/commited successfully or not. Even thought about having a flag setup in db for whichever documents commited to

Re: Commit after every document - alternate approach

2016-03-03 Thread sangs8788
Hi Emir, Right now we are having only inserts into SOLR. The main reason for having commit after each document is to get a guarantee that the document has got indexed in solr. Until the commit status is received back the document will not be deleted from MQ. So that even if there is a commit

Re: Solr (5.3.1) doesn't delete orphaned child documents

2016-03-03 Thread Alexandre Rafalovitch
I suspect not (starting from 'delete parent only'). I would check this against Solr 5.5 as it fixed a bunch of parent/child related issues. See, for example, SOLR-5211 Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 4 March

Re: Indexing Twitter - Hypothetical

2016-03-03 Thread Alexandre Rafalovitch
I think some of the Twitter's need to index in a particular way comes from their real-time need. So, that's part of the decision for the original poster, on how responsive data needs to be. As to the rest, I think the company that shows twitter messages on TV does something similar with Solr.

Re: Indexing Twitter - Hypothetical

2016-03-03 Thread Jack Krupansky
As always, the initial question always needs to be how you wish to query the data - query will drive the data model. I don't want to put words in your mouth as to your query requirements, so... clue us in on your query requirements. -- Jack Krupansky On Thu, Mar 3, 2016 at 2:25 PM, Toke

Re: Add me to the Solr ContributorsGroup

2016-03-03 Thread Shawn Heisey
On 3/3/2016 12:17 PM, Saïd Radhouani wrote: > Actually, I just found my username in the list of names ( > https://wiki.apache.org/solr/ContributorsGroup), however, when I wanted to > create my own page or change an existing one, I got the message: "You are > not allowed to edit this page". What

Re: What is the best way to index 15 million documents of total size 425 GB?

2016-03-03 Thread Jack Krupansky
What does a typical document look like - number of columns, data type, size? How much is text vs. numeric? Are there any large blobs? I mean, 15M docs in 425GB indicates about 28K per row/document which seems rather large. Is the PG data VARCHAR(n) or CHAR(n). IOW, might it have lots of trailing

Re: Add me to the Solr ContributorsGroup

2016-03-03 Thread Saïd Radhouani
Actually, I just found my username in the list of names ( https://wiki.apache.org/solr/ContributorsGroup), however, when I wanted to create my own page or change an existing one, I got the message: "You are not allowed to edit this page". Thank you in advance for your collaboration, -SR

Re: Add me to the Solr ContributorsGroup

2016-03-03 Thread Saïd Radhouani
Hello, Could you please add me to the Contributor Group. Here are my account info : - Name: Saïd Radhouani - User name: radhouani - email: said.radhou...@gmail.com For more info about myself, please visit my linked page: https://www.linkedin.com/in/radhouani Thanks, -Saïd 2015-12-30 20:36

Re: Group query in Solrcloud throws exception

2016-03-03 Thread Webster Homer
The query can be as simple as *:* and I still get the error http://localhost:8983/solr/sial-catalog-material_shard1_replica1/select?q=*%3A*=id%2C+sap_material_number%2C+material%2Cname=json=true=true=sap_material_number=-1 Removing the group.limit and the query works. So group.limit in solrcloud

RE: Windows Service

2016-03-03 Thread Pires, Guilherme
Hello, 4 words: Non Sucking Service Manager (https://nssm.cc/) :) Enjoy Guilherme Pires | Principal Architect Portugal | CGI -Original Message- From: Cannon, Steven [mailto:steven.can...@petrotechnics.com] Sent: quinta-feira, 3 de Março de 2016 19:34 To: solr-user@lucene.apache.org

Windows Service

2016-03-03 Thread Cannon, Steven
Hi, I am trying to install Solr into our Windows Server 2008 Environment so that it uses a Windows Service to start/ stop - but am having problems getting it configured. I am using Solr v5.2.1 with Java 7. Can anyone help? :) Regards, Steve Steve Cannon | Enterprise Implementation

Re: Indexing Twitter - Hypothetical

2016-03-03 Thread Toke Eskildsen
Joseph Obernberger wrote: > Hi All - would it be reasonable to index the Twitter 'firehose' with Solr > Cloud - roughly 500-600 million docs per day indexing each of the fields > (about 180)? Possible, yes. Reasonable? It is not going to be cheap. Twitter index the

Group query in Solrcloud throws exception

2016-03-03 Thread WebsterHomer
My company has a search application that we are trying to migrate to solrcloud. One of the key queries that we execute uses the field collapsing capability (grouping). We are using Solr 4.10.4 We set up a two node cloud where I created several collections each with two shards they all use the

Re: Separating cores from Solr home

2016-03-03 Thread Jeff Wartes
It’s a bit backwards feeling, but I’ve had luck setting the install dir and solr home, instead of the data dir. Something like: -Dsolr.solr.home=/data/solr -Dsolr.install.dir=/opt/solr So all of the Solr files are in in /opt/solr and all of the index/core related files end up in /data/solr.

Re: mergeFactor/maxMergeDocs is deprecated

2016-03-03 Thread Daniel Collins
See https://issues.apache.org/jira/browse/SOLR-8734, it will be fixed in the next release On 3 March 2016 at 17:38, Tom Evans wrote: > Hi all > > Updating to Solr 5.5.0, and getting these messages in our error log: > > Beginning with Solr 5.5, is deprecated, configure

Re: XX:ParGCCardsPerStrideChunk

2016-03-03 Thread Jeff Wartes
I've experimented with that a bit, and Shawn added my comments in IRC to his Solr/GC page here: https://wiki.apache.org/solr/ShawnHeisey The relevant bit: "With values of 4096 and 32768, the IRC user was able to achieve 15% and 19% reductions in average pause time, respectively, with the

Indexing Twitter - Hypothetical

2016-03-03 Thread Joseph Obernberger
Hi All - would it be reasonable to index the Twitter 'firehose' with Solr Cloud - roughly 500-600 million docs per day indexing each of the fields (about 180)? If I were to guess at a sharded setup to handle such data, and keep 2 years worth, I would guess about 2500 shards. Is that reasonable?

What is the best way to index 15 million documents of total size 425 GB?

2016-03-03 Thread Aneesh Mon N
Hi, We are facing a huge performance issue while indexing the data to Solr, we have around 15 million records in a PostgreSql database which has to be indexed to Solr 5.3.1 server. It takes around 16 hours to complete the indexing as of now. To be noted that all the fields are stored so as to

mergeFactor/maxMergeDocs is deprecated

2016-03-03 Thread Tom Evans
Hi all Updating to Solr 5.5.0, and getting these messages in our error log: Beginning with Solr 5.5, is deprecated, configure it on the relevant instead. Beginning with Solr 5.5, is deprecated, configure it on the relevant instead. However, mergeFactor is only mentioned in a commented out

Re: BlockJoinQuery parser and ArrayIndexOutOfBoundException

2016-03-03 Thread Mikhail Khludnev
Hello, I happy you could deal with it. I appreciate if you help to collect more info like solr version, form of query and exception trace. Thanks! On Thu, Mar 3, 2016 at 3:28 PM, Sathyakumar Seshachalam < sathyakumar_seshacha...@trimble.com> wrote: > Just a commit after a delete makes the

RE: Prevent the SSL Keystore and Truststore password from showing up in the Solr Admin and Linux processes (Solr 5.2.1)

2016-03-03 Thread Katherine Mora
Hi Zara, I think that is done when you generate the self-signed certificate and the key. If you check the documentation: https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-Generateaself-signedcertificateandakey it says: The "-ext SAN=..." keytool option allows you to

Solr (5.3.1) doesn't delete orphaned child documents

2016-03-03 Thread naeem.tahir
Hi,    I noticed some strange behavior when deleting orphaned child documents in Solr 5.3.1. I am indexing nested documents in parent/child hierarchy. When I delete a child document whose parent is already deleted previously, child document still shows up in search. I am using

Re: Prevent the SSL Keystore and Truststore password from showing up in the Solr Admin and Linux processes (Solr 5.2.1)

2016-03-03 Thread Zara Parst
Hello Katherine, I am sorry to ask this question. But really i need some light on bellow matter. I want to run solr in cloud mode . So obliviously I am going to use zookeeper. My quorum are distributed on 3 server with static ip , lets say server.1=xx.xx.x1:2888:3888

Re: Override Default Similarity and SolrCloud

2016-03-03 Thread Joshan Mahmud
Thanks Markus! On Thu, Mar 3, 2016 at 2:17 PM, Markus Jelsma wrote: > Hi - config is stored in ZK, libs must be present on each node and are > rsync there via provisioning. > > Markus > > -Original message- > > From:Joshan Mahmud > >

RE: Override Default Similarity and SolrCloud

2016-03-03 Thread Markus Jelsma
Hi - config is stored in ZK, libs must be present on each node and are rsync there via provisioning. Markus -Original message- > From:Joshan Mahmud > Sent: Thursday 3rd March 2016 15:02 > To: solr-user@lucene.apache.org > Subject: Re: Override Default

Re: FW: Difference Between Tokenizer and filter

2016-03-03 Thread Jack Krupansky
Try re-reading the doc on "Understanding Analyzers, Tokenizers, and Filters" and then ask specific questions on specific statements made in the doc: https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters As far as on-disk format, a Solr user has

RE: FW: Difference Between Tokenizer and filter

2016-03-03 Thread Vanlerberghe, Luc
The "index" type analyzer is used when documents are indexed and determines what tokens end up in the index. The "query" type analyzer is used to analyze the user query and determines what tokens will be searched for. As an example: If you want to be able to match on synonyms, you could have a

RE: Prevent the SSL Keystore and Truststore password from showing up in the Solr Admin and Linux processes (Solr 5.2.1)

2016-03-03 Thread Katherine Mora
Hi Jeff, Are you still having the same issue or did you manage to fix it? I have the default files that come with the Solr 5.2.1 installation (I’m copying those below). I’m pretty sure my password is correct, unless the tool is generating one that does not match my version? I’m using the jetty

Re: Override Default Similarity and SolrCloud

2016-03-03 Thread Joshan Mahmud
Thanks Markus - do you just SCP / copy them manually to your solr nodes and not through Zookeeper (if you use that)? Josh On Thu, Mar 3, 2016 at 1:59 PM, Markus Jelsma wrote: > We store them server/solr/lib/. > > > -Original message- > > From:Joshan Mahmud

RE: Override Default Similarity and SolrCloud

2016-03-03 Thread Markus Jelsma
We store them server/solr/lib/. -Original message- > From:Joshan Mahmud > Sent: Thursday 3rd March 2016 14:54 > To: solr-user@lucene.apache.org > Subject: Override Default Similarity and SolrCloud > > Hi group! > > I'm having an issue of deploying a custom

Which query to prefer

2016-03-03 Thread Mark Robinson
Hi, I have a 125 million doc index1. I identified 25 values for fieldA in the index. Each value can appear multiple times (1). There is another fieldB in the same index. I identified 6 values for this fieldB. I want only those records in (1) which contain any of these values in fieldB. Query:-

Override Default Similarity and SolrCloud

2016-03-03 Thread Joshan Mahmud
Hi group! I'm having an issue of deploying a custom jar in SolrCloud (v 5.3.0). I have a working local Solr environment (v 5.3.0 - NOT SolrCloud) whereby I have: - a jar containing one class CustomSimilarity which extends org.apache.lucene.search.similarities.DefaultSimilarity - the

RE: FW: Difference Between Tokenizer and filter

2016-03-03 Thread G, Rajesh
Hi Shawn, One last question on analyzer. If the format of the index on disk is not controlled by the tokenizer, or anything else in the analysis chain, then what does type="index" and type="query" in analyzer mean. Can you please help me in understanding?

Uploading files to Zookeeper

2016-03-03 Thread philippa griggs
Hello I have a set of pre-existing configuration files and want to create a new solr cluster. As part of this new cluster I want to name the shards and use a CompositeId router. My core.properties file is: name=sessions shard=${shard:shard1} coreNodeName=${coreNodeName:node1}

Re: facet on two multi-valued fields

2016-03-03 Thread Jan Høydahl
Hi, BlockJoin with Parent/Child is your solution. See http://yonik.com/solr-nested-objects/ and https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 3. mar. 2016 kl. 10.35 skrev Andreas Hubold

RE: FW: Difference Between Tokenizer and filter

2016-03-03 Thread G, Rajesh
Thanks Shawn. This helps Corporate Executive Board India Private Limited. Registration No: U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, Haryana-122002, India. This e-mail and/or its attachments are intended only for the use of the

Re: BlockJoinQuery parser and ArrayIndexOutOfBoundException

2016-03-03 Thread Sathyakumar Seshachalam
Just a commit after a delete makes the ArrayIndexOutOfBoundException go-away. And in my scenario, the block join query does seem to work even if there are standalone children at least based on preliminary tests. On 03/03/16, 12:37 PM, "Mikhail Khludnev" wrote: >On

Re: Separating cores from Solr home

2016-03-03 Thread Upayavira
You can add a entry in your solrconfig to point to an entirely different location for libraries. Update them, then you just need to reload cores, not the whole of Solr. Then, try setting the LOG4J_PROPS envvar before starting Solr, so you can have your logging configs somewhere else, and

Re: Separating cores from Solr home

2016-03-03 Thread Tom Evans
Hmm, I've worked around this by setting the directory where the indexes should live to be the actual solr home, and symlink the files from the current release in to that directory, but it feels icky. Any better ideas? Cheers Tom On Thu, Mar 3, 2016 at 11:12 AM, Tom Evans

Re: Solr-kerbarose URL not accessible

2016-03-03 Thread Ishan Chattopadhyaya
This appears to be Cloudera search specific. The kerberos support in Solr is similar to, but not identical with, the kerberos support in Cloudera's Search. Maybe you could check with Cloudera's support? On Fri, Feb 12, 2016 at 8:06 PM, Shawn Heisey wrote: > On 2/12/2016

Separating cores from Solr home

2016-03-03 Thread Tom Evans
Hi all I'm struggling to configure solr cloud to put the index files and core.properties in the correct places in SolrCloud 5.5. Let me explain what I am trying to achieve: * solr is installed in /opt/solr * the user who runs solr only has read only access to that tree * the solr home files -

Re: Does SolrEntityProcessor works with Solr Cloud ?

2016-03-03 Thread Erik Hatcher
Neeraj - SolrEntityProcessor does not yet work with CloudSolrClient, only the direct HTTP one as you’ve determined. Please feel free to file a JIRA for this feature request. However, practically speaking, using a hard-coded HTTP end-point in the SolrEntityProcessor configuration would still

solr 5.5: swap + unload does not work

2016-03-03 Thread Fabrizio Fortino
I have created a Solr CoreAdminHandler extension with the goal to swap two cores and remove the old one. My code looks like this: SolrCore core = coreContainer.create("newcore", coreProps) coreContainer.swap("newcore", "livecore") // the old livecore is now newcore, so unload it and remove all

Does SolrEntityProcessor works with Solr Cloud ?

2016-03-03 Thread Neeraj Bhatt
Hello All I am tryiing to import data from one solr cloud into another using SolrEntityProcessor. My schema got changed and I need to reindex 1. Does SolrEntityProcessor works with Solr cloud to get data from Solr Cloud ? It looks it will not work as SolrEntityProcessor code is creating

Re: Querying through SolrJ taking lot of time

2016-03-03 Thread Erik Hatcher
Mark - there’s more to the equation than your query, I imagine. Are you returning a larger number of rows or facets? Can you share the Solr log of that request (and perhaps any request handler config if you’ve adjusted that)? Erik > On Mar 3, 2016, at 4:21 AM, Mark Robinson

Re: facet on two multi-valued fields

2016-03-03 Thread Andreas Hubold
Hi, sorry, the subject may have been misleading. I want to get facet results for only one field (tagIds) but restrict the returned values to those with a matching tagDescription. Both multi-valued fields have the same order. Example docs id:"1" tagIds:["10","12","13"]

Querying through SolrJ taking lot of time

2016-03-03 Thread Mark Robinson
Hi, I am running the following query on an index that has around 123 million records, using SolrJ.. Each record has only 5 fields. String *qry*="( fieldA:(value1 OR value2 OR value24) AND fieldB:(value1 OR value2 OR value3 OR value4 OR value5) ) (...basically a simple AND of 2 ORs) When I