Re: SolrCloud setup - any advice?

2013-09-27 Thread Neil Prosser
Good point. I'd seen docValues and wondered whether they might be of use in this situation. However, as I understand it they require a value to be set for all documents until Solr 4.5. Is that true or was I imagining reading that? On 25 September 2013 11:36, Erick Erickson

Re: cold searcher

2013-09-27 Thread Dmitry Kan
Thanks Shawn, the master-slave setup is something that requires separate study as our update rate is more of bulk type than small incremental bits (at least at this point). But thanks, this background information always useful. On Thu, Sep 26, 2013 at 10:52 PM, Shawn Heisey s...@elyograg.org

Re: cold searcher

2013-09-27 Thread Dmitry Kan
Erick, I actually agree and we are looking into bundling commits into a batch type update with soft-commits serving the batches and hard commit kicking in larger periods of time. In practice, we have already noticed the periodic slow downs in search for exactly same queries before and after

Re: ALIAS feature, can be used for what?

2013-09-27 Thread Yago Riveiro
I need delete the alias for the old collection before point it to the new, right? -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, September 27, 2013 at 2:25 AM, Otis Gospodnetic wrote: Hi, Imagine you have an index and you need to reindex your data into

Can i trust the order of how documents are received in solrcloud?

2013-09-27 Thread xaon
Hi, i am a new user of solrcloud, and i am wandering if this scenario could happen: in a Shard, i have three machine: leader, replica1, replica2 replica1 received a document D, and right after that, replica2 received an updated version of D, let's called it D' they all tried to forward their

Re: Doing time sensitive search in solr

2013-09-27 Thread Alexandre Rafalovitch
If your different strings have different semantics (date, etc), you may need to split your entries based on that semantics. Either have the 'entity' represent one 'string-date' structure or have additional field that represents content searchable during that specific period and only have one with

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-27 Thread Andreas Owen
i removed the FieldReaderDataSource and dataSource=fld but it didn't help. i get the following for each document: DataImportHandlerException: Exception in invoking url null Processing Document # 9 nullpointerexception On 26. Sep 2013, at 8:39 PM, P Williams wrote: Hi,

Pubmed XML indexing

2013-09-27 Thread Francisco Fernandez
Hi, I'm a newby trying to index PubMed texts obtained as xml with similar structure to: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmedretmode=xmlid=23864173,22073418 The nodes I need to extract, expressed as XPaths would be: //PubmedArticle/MedlineCitation/PMID

Re: Sum function causing error in solr

2013-09-27 Thread Tanu Garg
Yes jack. have tried this. but giving the same error. -- View this message in context: http://lucene.472066.n3.nabble.com/Sum-function-causing-error-in-solr-tp4091901p4092307.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sum function causing error in solr

2013-09-27 Thread Tanu Garg
tried this as well. but its not working. -- View this message in context: http://lucene.472066.n3.nabble.com/Sum-function-causing-error-in-solr-tp4091901p4092306.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: autocomplete_edge type split words

2013-09-27 Thread elisabeth benoit
Thanks for your answer. So I guess if someone wants to search on two fields, on with phrase query and one with normal query (splitted in words), one has to find a way to send query twice: one with quote and one without... Best regards, Elisabeth 2013/9/27 Erick Erickson erickerick...@gmail.com

Re: Pubmed XML indexing

2013-09-27 Thread Alexandre Rafalovitch
Did you look at dataImportHandler? There is also Flume, I think. Regards, Alex On 27 Sep 2013 17:28, Francisco Fernandez fra...@gmail.com wrote: Hi, I'm a newby trying to index PubMed texts obtained as xml with similar structure to:

Solr Commit Time

2013-09-27 Thread Prasi S
Hi, What would be the maximum commit time for indexing 1 lakh documents in solr on a 32 gb machine. Thanks, Prasi

Re: Sum function causing error in solr

2013-09-27 Thread Yonik Seeley
On Fri, Sep 27, 2013 at 2:28 AM, Tanu Garg tanugarg2...@gmail.com wrote: tried this as well. but its not working. It's working fine for me. What version of Solr are you using? What does your complete request look like? -Yonik http://lucidworks.com

Re: Pubmed XML indexing

2013-09-27 Thread Michael Sokolov
You might be interested in Lux (http://luxdb.org), which is designed for indexing and querying XML using Solr and Lucene. It can run index-supported XPath/XQuery over your documents, and you can define arbitrary XPath indexes. -Mike On 9/27/13 6:28 AM, Francisco Fernandez wrote: Hi, I'm a

Solr client 'Timeout was reached' ~ when new documents are inserted and commits are made.

2013-09-27 Thread Rafał Radecki
Hi All. I have a solr 3.5 multicore installation. It has ~250 of documents, ~1,5GB of index data. When the solr is feed with new documents I see for a few seconds timeouts 'Timeout was reached' on clients. Is it normal behaviour of solr during inserting of new documents? Best regards,

Re: ContributorsGroup

2013-09-27 Thread Erick Erickson
Stefan is more thorough than me, I'd have added the wrong name :) Thanks for volunteering! Erick On Thu, Sep 26, 2013 at 9:17 PM, JavaOne javaone...@yahoo.com wrote: Yes - that is me. mikelabib is my Jira user. Thanks for asking. Sent from my iPhone On Sep 26, 2013, at 7:32 PM, Erick

Re: SolrCloud setup - any advice?

2013-09-27 Thread Erick Erickson
I think you're right, but you can specify a default value in your schema.xml to at least see if this is a good path to follow. Best, Erick On Fri, Sep 27, 2013 at 3:46 AM, Neil Prosser neil.pros...@gmail.com wrote: Good point. I'd seen docValues and wondered whether they might be of use in

Re: autocomplete_edge type split words

2013-09-27 Thread Erick Erickson
Have you looked at autoGeneratePhraseQueries? That might help. If that doesn't work, you can always do something like add an OR clause like OR original query and optionally boost it high. But I'd start with the autoGenerate bits. Best, Erick On Fri, Sep 27, 2013 at 7:37 AM, elisabeth benoit

Re: Solr Commit Time

2013-09-27 Thread Erick Erickson
No way to say. How have you configured your autowarming parameters for instance? Why do you care? What problem are you trying to solve? Solr automatically handles warming up searchers and switching to the new one after a commit. Best, Erick On Fri, Sep 27, 2013 at 7:56 AM, Prasi S

Re: Solr client 'Timeout was reached' ~ when new documents are inserted and commits are made.

2013-09-27 Thread Erick Erickson
No, this isn't normal. You probably have your servlet container or your clients have a too-short timeout. How long are we talking about here anyway? Best, Erick On Fri, Sep 27, 2013 at 8:57 AM, Rafał Radecki r.rade...@polskapresse.pl wrote: Hi All. I have a solr 3.5 multicore installation. It

Re: Solr client 'Timeout was reached' ~ when new documents are inserted and commits are made.

2013-09-27 Thread Rafał Radecki
On client side timeout is set to 5s but when I look in solr log I see QTime less than 5000 (in ms). We use jetty to start solr process, where should I look for directives connected with timeouts?

Re: Pubmed XML indexing

2013-09-27 Thread Francisco Fernandez
Many thanks both Mike and Alexandre. I'll peek those tools. Lux seems a good option. Thanks again, Francisco El 27/09/2013, a las 09:33, Michael Sokolov escribió: You might be interested in Lux (http://luxdb.org), which is designed for indexing and querying XML using Solr and Lucene. It can

DIH - delta query and delta import query executes transformer twice

2013-09-27 Thread Lee Carroll
Hi It looks like when a DIH entity has a delta and delta import query plus a transformer defined the execution of both query's call the transformer. I was expecting it to only be called on the import query. Sure we can check for a null value or something and just return the row during the delta

Re: autocomplete_edge type split words

2013-09-27 Thread elisabeth benoit
Yes! what I've done is set autoGeneratePhraseQueries to true for my field, then give it a boost (bq=myAutompleteEdgeNGramField=my query with spaces^50). This only worked with autoGeneratePhraseQueries=true, for a reason I didn't understand. since when I did q= myAutompleteEdgeNGramField=my

Re: Solr client 'Timeout was reached' ~ when new documents are inserted and commits are made.

2013-09-27 Thread Shawn Heisey
On 9/27/2013 7:41 AM, Rafał Radecki wrote: On client side timeout is set to 5s but when I look in solr log I see QTime less than 5000 (in ms). We use jetty to start solr process, where should I look for directives connected with timeouts? Five seconds is WAY too short a timeout for the

Re: Solr client 'Timeout was reached' ~ when new documents are inserted and commits are made.

2013-09-27 Thread Shawn Heisey
On 9/27/2013 8:37 AM, Shawn Heisey wrote: INFO - 2013-09-27 08:27:00.806; org.apache.solr.update.processor.LogUpdateProcessor; [inclive] webapp=/solr path=/update params={wt=javabinversion=2} {add=[notimexpix438424 (144734108581888), notimexpix438425 (1447341085825171456),

Re: Solr Commit Time

2013-09-27 Thread Walter Underwood
Right, it could be minutes or hours. Are the documents five word of plain text or 500 pages of PDF? Is there one simple field or are you running multiple field for different languages, plus entity extraction? And so on. Also, some people on this list don't know the term lakh, it is better to

Hello and help :)

2013-09-27 Thread Matheus Salvia
Hello everyone, I'm having a problem regarding how to make a solr query, I've posted it on stackoverflow. Can someone help me? http://stackoverflow.com/questions/19039099/apache-solr-count-of-subquery-as-a-superquery-parameter Thanks in advance! -- -- // Matheus Salvia Desenvolvedor Mobile

Re: Solr and jvm Garbage Collection tuning

2013-09-27 Thread ewinclub7
ด้วยที่แทงบอลแบบออนไลน์กำลังมาแรงทำให้พวกโต๊ะบอลเดี๋ยวนี้ก็เริ่มขยับขยายมาเปิดรับแทงบอลออนไลน์เอง download goldclub http://www.goldclub.net/download/ เป้าหมายหลักในวิธีการเล่นคาสิโนนั้น มีเพื่อความเพลิดเพลินหรือความสนุก ไม่ใช่เพื่อมาหาเงินหรือหวังที่จะรวย

Re: Sum function causing error in solr

2013-09-27 Thread Tanu Garg
solr-4.3.1 -- View this message in context: http://lucene.472066.n3.nabble.com/Sum-function-causing-error-in-solr-tp4091901p4092342.html Sent from the Solr - User mailing list archive at Nabble.com.

Solr MailEntityProcessor not indexing Content-Type: multipart/mixed; emails

2013-09-27 Thread Andrey Padiy
Hi, Trying to use DIH and MailEntityProcessor but are unable to index emails that have Content-Type: multipart/mixed; or Content-Type: multipart/related; header. Solr logs show correct number of emails in the inbox when IMAP connection is established but only emails that are of Content-Type:

Solr doesn't return TermVectors

2013-09-27 Thread alibozorgkhan
I followed http://wiki.apache.org/solr/TermVectorComponent step by step but with the following request, I don't get any term vectors: http://localhost:8983/solr/mycol/select?q=id:1211wt=jsonindent=truetv=true Just to be sure, I have this in my schema: field name=test_field

Re: Solr doesn't return TermVectors

2013-09-27 Thread Jack Krupansky
You forgot the qt=custom-request-handler parameter, such as on the wiki: http://localhost:8983/solr/select/?qt=tvrhq=includes:[* TO *]fl=id And you need the custom request handler, such as on the wiki: requestHandler name=tvrh class=org.apache.solr.handler.component.SearchHandler lst

Re: Solr doesn't return TermVectors

2013-09-27 Thread alibozorgkhan
Thanks for your reply, I actually added that before and it didn't work. I tried it again and no luck. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-doesn-t-return-TermVectors-tp4092397p4092403.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr doesn't return TermVectors

2013-09-27 Thread Jack Krupansky
Show us the response you got. If you did have everything set up 100% properly and are still not seeing term vectors, then maybe you had indexed the data before setting up the full config. In which case, you would simply need to reindex the data. In that case the tem vector section would have

Re: Solr doesn't return TermVectors

2013-09-27 Thread alibozorgkhan
Hi Jack, With this query: http://localhost:8983/solr/mycol/select?q=id:1211wt=jsonindent=truetv=trueqt=tvrh I see all the fields associated with id:1211. I unloaded my collection using the Core Admin panel in solr, removed data and core.properties in my collection, added the core again and

Re: Solr doesn't return TermVectors

2013-09-27 Thread Chris Hostetter
: http://localhost:8983/solr/mycol/select?q=id:1211wt=jsonindent=truetv=trueqt=tvrh : : I see all the fields associated with id:1211. I unloaded my collection using : the Core Admin panel in solr, removed data and core.properties in my : collection, added the core again and imported the data. :

Re: Solr doesn't return TermVectors

2013-09-27 Thread Shawn Heisey
On 9/27/2013 1:35 PM, Jack Krupansky wrote: You forgot the qt=custom-request-handler parameter, such as on the wiki: http://localhost:8983/solr/select/?qt=tvrhq=includes:[* TO *]fl=id And you need the custom request handler, such as on the wiki: requestHandler name=tvrh

Re: Solr doesn't return TermVectors

2013-09-27 Thread alibozorgkhan
Hi, - This is the part I added to the solrconfig.xml: searchComponent name=tvComponent class=org.apache.solr.handler.component.TermVectorComponent/ requestHandler name=tvrh class=org.apache.solr.handler.component.SearchHandler lst name=defaults bool name=tvtrue/bool

Re: Solr doesn't return TermVectors

2013-09-27 Thread alibozorgkhan
Shawn !! That is it ! That fixed my problem, I changed name=tvrh to name=/tvrh and used http://localhost:8983/solr/mycol/tvrh instead and now it is returning the term vectors ! Thanx man -- View this message in context:

Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-27 Thread JMill
I am not sure about the value to use for the option popularity. Is there a method or do you just go with some arbitrary number? On Thursday, September 26, 2013, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Great!! I haven't see your message yet, perhaps you could create a PR

Re: Hello and help :)

2013-09-27 Thread Upayavira
Mattheus, Given these mails form a part of an archive that are themselves self-contained, can you please post your actual question here? You're more likely to get answers that way. Thanks, Upayavira On Fri, Sep 27, 2013, at 04:36 PM, Matheus Salvia wrote: Hello everyone, I'm having a problem

Re: Cross index join query performance

2013-09-27 Thread Peter Keegan
Hi Joel, I tried this patch and it is quite a bit faster. Using the same query on a larger index (500K docs), the 'join' QTime was 1500 msec, and the 'hjoin' QTime was 100 msec! This was for true for large and small result sets. A few notes: the patch didn't compile with 4.3 because of the

Re: Hello and help :)

2013-09-27 Thread Matheus Salvia
Sure, sorry for the inconvenience. I'm having a little trouble trying to make a query in Solr. The problem is: I must be able retrieve documents that have the same value for a specified field, but they should only be retrieved if this value appeared more than X times for a specified user. In

Re: Solr doesn't return TermVectors

2013-09-27 Thread Jack Krupansky
You are using components instead of last-components, so you have to all search components, including the QueryComponent. Better to use last-components. -- Jack Krupansky -Original Message- From: Shawn Heisey Sent: Friday, September 27, 2013 4:02 PM To: solr-user@lucene.apache.org

Re: Solr doesn't return TermVectors

2013-09-27 Thread Shawn Heisey
On 9/27/2013 4:02 PM, Jack Krupansky wrote: You are using components instead of last-components, so you have to all search components, including the QueryComponent. Better to use last-components. That did it. Thank you! I didn't know why this was a problem even with your note, until I read

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-27 Thread P Williams
I spent some more time thinking about this. Do you really need to use the TikaEntityProcessor? It doesn't offer anything new to the document you are building that couldn't be accomplished by the XPathEntityProcessor alone from what I can tell. I also tried to get the Advanced

Re: Hello and help :)

2013-09-27 Thread ssami
If I understand your question right, Result Grouping in Solr might help you. Refer here https://cwiki.apache.org/confluence/display/solr/Result+Grouping . -- View this message in context: http://lucene.472066.n3.nabble.com/Hello-and-help-tp4092371p4092439.html Sent from the Solr - User

Re: Hello and help :)

2013-09-27 Thread Matheus Salvia
Yes, but how to use result grouping inside a join/subquery? 2013/9/27 ssami ss...@outlook.com If I understand your question right, Result Grouping in Solr might help you. Refer here https://cwiki.apache.org/confluence/display/solr/Result+Grouping . -- View this message in context:

RE: Hello and help :)

2013-09-27 Thread Socratees Samipillai
Sorry, I take it back. I overlooked that you have two different collections. Thanks, — Socratees. Date: Fri, 27 Sep 2013 20:03:46 -0300 Subject: Re: Hello and help :) From: matheus2...@gmail.com To: solr-user@lucene.apache.org Yes, but how to use result grouping inside a join/subquery?

Re: Hello and help :)

2013-09-27 Thread Marcelo Elias Del Valle
Ssami, I work with Matheus and I am helping him to take a look at this problem. We took a look at result grouping, thinking it could help us, but it has two drawbacks: - We cannot have multivalued fields, if I understood it correctly. But ok, we could manage that... - Suppose some

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-27 Thread Andreas Owen
ok i see what your getting at but why doesn't the following work: field xpath=//h:h1 column=h_1 / field column=text xpath=/xhtml:html/xhtml:body / i removed the tiki-processor. what am i missing, i haven't found anything in the wiki? On 28. Sep 2013, at 12:28 AM, P

Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-27 Thread Ing. Jorge Luis Betancourt Gonzalez
Actually I don't use that field, it could be used to do some form of basic collaborative filtering, so you could use a high value for items in your collection that you want to come first, but in my case this was not a requirement and I don't use it at all. - Mensaje original - De:

RE: Hello and help :)

2013-09-27 Thread Socratees Samipillai
Hi Marcelo, I haven't faced this exact situation before so I can only try posting my thoughts. Since Solr allows Result Grouping and Faceting at the same time, and since you can apply filters on these facets, can you take advantage of that? Or, What if you can facet by the field, and group by

RE: Hello and help :)

2013-09-27 Thread Socratees Samipillai
Also, try the #solr and #solr-dev IRC channels at Freenode http://webchat.freenode.net/ Thanks, — Socratees. From: ss...@outlook.com To: solr-user@lucene.apache.org Subject: RE: Hello and help :) Date: Fri, 27 Sep 2013 17:23:28 -0700 Hi Marcelo, I haven't faced this exact situation

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-27 Thread Alexandre Rafalovitch
This is a rather complicated example to chew through, but try the following two things: *) dataField=${tika.text} = dataField=text (or less likely htmlMapper tika.text) You might be trying to read content of the field rather than passing reference to the field that seems to be expected. This

Issue in parallel Indexing using multiple csv files

2013-09-27 Thread zaheer.java
Using SOLR 4.4 I'm trying to index solr core using a csv file of around 1 million records. To improve the performance, I've split the csv files into smaller sizes and tried to use csv update handler for each file to run in a separate thread. The outcome was weird. The total count of Solr

Re: Hello and help :)

2013-09-27 Thread Upayavira
To phrase your need more generically: * find all documents for userID=x, where userID=x has more than y documents in the index Is that correct? If it is, I'd probably do some work at index time. First guess, I'd keep a separate core, which has a very small document per user, storing just: *