Re: how to index 20 MB plain-text xml

2014-03-31 Thread Floyd Wu
Hi Alex, Thanks for your responding. Personally I don't want to feed these big xml to solr. But users wants. I'll try your suggestions later. Many thanks. Floyd 2014-03-31 13:44 GMT+08:00 Alexandre Rafalovitch arafa...@gmail.com: Without digging too deep into why exactly this is happening,

Re: how to index 20 MB plain-text xml

2014-03-31 Thread primoz . skale
Hi! I had the same issue with XML files. Even small XML files produced OOM exception. I read that the way XMLs are parsed can sometimes blow up memory requirements to such values that java runs out of heap. My solution was: 1. Don't parse XML files 2. Parse only small XML files and hope for

Unsuccessful queries for terms next to tabs and newlines in uploaded Word documents

2014-03-31 Thread chtjfi
Short Version: What do I need to do to successfully query for terms that are adjacent to tabs and newlines (i.e. \t, \n) in an uploaded Word document? Long Version: I am using Solr 4.6.1. I am running an unmodified version of the example core that is started by running java -jar start.jar in the

Strange behavior while deleting

2014-03-31 Thread abhishek jain
hi friends, I have observed a strange behavior, I have two indexes of same ids and same number of docs, and i am using a json file to delete records from both the indexes, after deleting the ids, the resulting indexes now show different count of docs, Not sure why I used curl with the same json

Re: Expected date of release for Solr 4.7.1

2014-03-31 Thread Puneet Pawaia
Thanks for the update, Mike. Regards Puneet On Sat, Mar 29, 2014 at 11:58 PM, Michael McCandless luc...@mikemccandless.com wrote: RC2 is being voted on now ... so it should be soon (a few days, but more if any new blocker issues are found and we need to do RC3). Mike McCandless

Re: Strange behavior while deleting

2014-03-31 Thread Jack Krupansky
Do the two cores have identical schema and solrconfig files? Are the delete and merge config settings the sameidentical? Are these two cores running on the same Solr server, or two separate Solr servers? If the latter, are they both running the same release of Solr? How big is the

Re: MergingSolrIndexes not supported by SolrCloud?why?

2014-03-31 Thread rulinma
I think that maybe my problem with cluster. I will adjust to test again. 3X! -- View this message in context: http://lucene.472066.n3.nabble.com/MergingSolrIndexes-not-supported-by-SolrCloud-why-tp4127111p4128113.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unsuccessful queries for terms next to tabs and newlines in uploaded Word documents

2014-03-31 Thread Jack Krupansky
What field type and analyzer are you using? Normally, both the standard ad whitespace tokenizers will break tokens at all white space, which includes tabs. Check your df and qf parameters to see that they are querying the attr_content field. Query the attr_content field directly, as a test.

Re: eDismax parser and the mm parameter

2014-03-31 Thread Jack Krupansky
The pf, pf2, and pf3 parameters should cover cases 1 and 2. Use q.op=OR (the default) and ignore the mm parameter. Give pf the highest boost, and boost pf3 higher than pf2. You could try using the complex phrase query parser for the third case. -- Jack Krupansky -Original Message-

Re: Strange behavior while deleting

2014-03-31 Thread abhishek . netjain
Hi, These settings are commented in schema. These are two different solr severs and almost identical schema ‎with the exception of one stemmed field. Same solr versions are running. Please help. Thanks  Abhishek   Original Message   From: Jack Krupansky Sent: Monday, 31 March 2014 14:54 To:

get sub-facets based on main-facet selections

2014-03-31 Thread Jan Verweij - Reeleez
Dear, I'm implementing a productcatalog and have 5 main facets and 60+ possible subfacets. If I select a specific value from one of my main facets, let's say, productgroupX, I want to show the facets related to this productgroup, say length and height. But if productgroupY is selected I have to

Re: Strange behavior while deleting

2014-03-31 Thread Jack Krupansky
So, how big is the discrepancy? If you do a *:* query for rows=100, is the 100th result the same for both? Do a bunch of random queries and see if you can find a document key that is missing from one core, but present in the other, and check if it should have been deleted. Are you deleting

Re: More Robust Search Timeouts (to Kill Zombie Queries)?

2014-03-31 Thread Salman Akram
Anyone? On Wed, Mar 26, 2014 at 7:55 PM, Salman Akram salman.ak...@northbaysolutions.net wrote: With reference to this threadhttp://mail-archives.apache.org/mod_mbox/lucene-solr-user/200903.mbox/%3c856ac15f0903272054q2dbdbd19kea3c5ba9e105b...@mail.gmail.com%3EI wanted to know if there was

Re: Product index schema for solr

2014-03-31 Thread Ajay Patel
Hi Erick Thank for the reply :). your solution help me to denormalize my data. now i have one another question can i create a generalize range facet according to min_qty and max_qty? Thanks Regards Ajay Patel. On Saturday 29 March 2014 08:54 PM, Erick Erickson wrote: The usual approach is

Re: Multiple Languages in Same Core

2014-03-31 Thread Jeremy Thomerson
Thanks Trey! Last week I ordered the eBook. I look forward to seeing the information in it. Jeremy On Thu, Mar 27, 2014 at 6:03 PM, Trey Grainger solrt...@gmail.com wrote: In addition to the two approaches Liu Bo mentioned (separate core per language and separate field per language), it is

Re: get sub-facets based on main-facet selections

2014-03-31 Thread Erick Erickson
Have you looked at pivot facets? It _might_ help here with the first part. That said, pivot facets can be expensive (as always, it depends) and the two-query solution might be better, gotta test. About the second part: bq: one of my main facets returns with just a single value Not sure how

solr 4.2.1 index gets slower over time

2014-03-31 Thread elisabeth benoit
Hello, We are currently using solr 4.2.1. Our index is updated on a daily basis. After noticing solr query time has increased (two times the initial size) without any change in index size or in solr configuration, we tried an optimize on the index but it didn't fix our problem. We checked the

Re: More Robust Search Timeouts (to Kill Zombie Queries)?

2014-03-31 Thread Luis Lebolo
Hi Salman, I was interested in something similar, take a look at the following thread: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201401.mbox/%3CCADSoL-i04aYrsOo2%3DGcaFqsQ3mViF%2Bhn24ArDtT%3D7kpALtVHzA%40mail.gmail.com%3E#archives I never followed through, however. -Luis On

Re: solr 4.2.1 index gets slower over time

2014-03-31 Thread Shawn Heisey
On 3/31/2014 6:57 AM, elisabeth benoit wrote: We are currently using solr 4.2.1. Our index is updated on a daily basis. After noticing solr query time has increased (two times the initial size) without any change in index size or in solr configuration, we tried an optimize on the index but it

Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2014-03-31 Thread Rishi Easwaran
The SSD is separated into logical volumes..each instance gets 100 GB SSD disk space to write its index. If I add them all up its ~45GB in 1TB SSD disk space. Not sure I get You should not be running more than one instance of Solr per machine.One instance of Solr can run multiple indexes.

Re: solr 4.2.1 index gets slower over time

2014-03-31 Thread elisabeth benoit
Hello, Thanks for your answer. We use JVisualVM. The CPU usage is very high (90%), but the GC activity shows less than 0.01% average activity. Plus the heap usage stays low (below 4G while the max heap size is 16G). Do you have a different tool to suggest to check the GC? Do you think there is

RE: setting up solr on tomcat

2014-03-31 Thread Lieberman, Ariel
In Tomcat 7 there was a bug with resolving URLs ending in /. This should be fixed in Tomcat 7.0.5+, see SOLR-2022 for full details. -Original Message- From: Pradeep Pujari [mailto:prade...@rocketmail.com] Sent: Monday, March 24, 2014 4:02 AM To: solr-user@lucene.apache.org Subject: Re:

Re: solr 4.2.1 index gets slower over time

2014-03-31 Thread Shawn Heisey
On 3/31/2014 9:03 AM, elisabeth benoit wrote: We use JVisualVM. The CPU usage is very high (90%), but the GC activity shows less than 0.01% average activity. Plus the heap usage stays low (below 4G while the max heap size is 16G). Do you have a different tool to suggest to check the GC? Do you

Request for adding to Contributors Group

2014-03-31 Thread Aditya Choudhuri
Hello! Please add my email and SolrWiki account in the ContributorsGroup. My Wiki name = AdityaChoudhuri https://wiki.apache.org/solr/AdityaChoudhuri Thank you. Aditya

Re: Request for adding to Contributors Group

2014-03-31 Thread Steve Rowe
Aditya, I’ve added your username to the Solr ContributorsGroup page, so you should now be able to edit wiki pages. Steve On Mar 31, 2014, at 1:25 PM, Aditya Choudhuri adi...@jumpstartsys.com wrote: Hello! Please add my email and SolrWiki account in the ContributorsGroup. My Wiki name

What is Overseer?

2014-03-31 Thread Chris W
What is the role of an overseer in solrcloud? The documentation does not offer full details about it. What if an overseer node goes down? -- Best -- C

Re: New to Solr can someone help me to know if Solr fits my use case

2014-03-31 Thread Saurabh Agarwal
Thanks a lot Alexandre for the response much appreciated. Thanks Saurabh On Fri, Mar 28, 2014 at 8:56 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: 1. You don't actually put PDF/Word into Solr. Instead, it is run through content and metadata extraction process and then index that. This

How to delete documents

2014-03-31 Thread Kaushik
From a database table, we have figured out a way to do the full load and the delta loads. However, there are scenarios where some of the DB rows get deleted. How can we have such documents deleted from SOLR indices? Thanks, Kaushik

Filter caching

2014-03-31 Thread youknow...@heroicefforts.net
Re-reading the documentation, it seems that Solr caches the results of the fq parameter, not lower level field constraints. This would imply that breaking a single complex boolean filter into multiple conjunctive fq parameters would improve the odds for cache hits. Is this correct? fq=(a:foo

Re: how to index 20 MB plain-text xml

2014-03-31 Thread Upayavira
Tell the user they can't have! Or, write a small app that reads in their XML in one go, and pushes it in parts to Solr. Generally, I'd say letting a user hit Solr directly is a bad thing - especially a user who doesn't know the details of how Solr works. Upayavira On Mon, Mar 31, 2014, at 07:17

Re: Filter caching

2014-03-31 Thread Yonik Seeley
On Mon, Mar 31, 2014 at 2:43 PM, youknow...@heroicefforts.net youknow...@heroicefforts.net wrote: Re-reading the documentation, it seems that Solr caches the results of the fq parameter, not lower level field constraints. This would imply that breaking a single complex boolean filter into

Re: What is Overseer?

2014-03-31 Thread Furkan KAMACI
Hi Chris; You should check here: http://grokbase.com/t/lucene/solr-user/12bd9kst9t/role-purpose-of-overseer Thanks; Furkan KAMACI 2014-03-31 20:43 GMT+03:00 Chris W chris1980@gmail.com: What is the role of an overseer in solrcloud? The documentation does not offer full details about it.

Re: Enabling other SimpleText formats besides postings

2014-03-31 Thread Ken Krugler
Hi all (and particularly Uwe and Robert), On Mar 28, 2014, at 7:24am, Michael McCandless luc...@mikemccandless.com wrote: You told the fieldType to use SimpleText only for the postings, not all other parts of the codec (doc values, live docs, stored fields, etc...), and so it used the

Re: Enabling other SimpleText formats besides postings

2014-03-31 Thread Erik Hatcher
On Mar 31, 2014, at 4:02 PM, Ken Krugler kkrugler_li...@transpac.com wrote: Hi all (and particularly Uwe and Robert), On Mar 28, 2014, at 7:24am, Michael McCandless luc...@mikemccandless.com wrote: You told the fieldType to use SimpleText only for the postings, not all other parts of

Re: Enabling other SimpleText formats besides postings

2014-03-31 Thread Shawn Heisey
On 3/31/2014 2:36 PM, Erik Hatcher wrote: Not currently possible. Solr’s SchemaCodecFactory only has a hook for postings format (and doc values format). Erik Would it be a reasonable thing to develop a config structure (probably in schema.xml) that starts with something like codec

spellcheck in solr-4.6-1 distrib=true

2014-03-31 Thread alxsss
Hello, For queries in solrcloud and in distributed mode solr-4.6.1 spellcheck does not return any suggestions, but in non-distrubited mode. Is this a know bug? Thanks. Alex.

Re: Enabling other SimpleText formats besides postings

2014-03-31 Thread Ken Krugler
Hi Erik ( Shawn), On Mar 31, 2014, at 1:48pm, Shawn Heisey s...@elyograg.org wrote: On 3/31/2014 2:36 PM, Erik Hatcher wrote: Not currently possible. Solr’s SchemaCodecFactory only has a hook for postings format (and doc values format). OK, thanks for confirming. Would it be a reasonable

Re: What is Overseer?

2014-03-31 Thread Jack Krupansky
So, is Overseer really only an implementation detail or something that Solr Ops guys need to be very aware of? -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, March 31, 2014 3:17 PM To: solr-user@lucene.apache.org Subject: Re: What is Overseer? Hi Chris; You

Re: how to index 20 MB plain-text xml

2014-03-31 Thread Floyd Wu
Hi Upayavira, User don't hit solr directly, the search documents through my application. The application is a entrance for user to upload documents and then indexed by solr. the situation is they upload a plain-text, something like dictionary. You know, that dictionary is something big. I'm trying

Re: how to index 20 MB plain-text xml

2014-03-31 Thread Alexandre Rafalovitch
If you have an application, why are you sending XML documents to Solr? Can't you convert it to any other format and then send them in batches? Or even if it is XML, just bite and send in 100 document batches. Or in smaller batches and use auto-commit settings I mentioned earlier. Regards,

ranking retrieval measure

2014-03-31 Thread azhar2007
Hi people. Ive developed a search engine to implement and improve it using another search engine as a test case. Now I want to compare and test results from both to determine which is better. I am unaware of how to do this so someone please point me in the right direction. Regards -- View this

RE: How to delete documents

2014-03-31 Thread Suresh Soundararajan
Kaushik, Before delete the rows in the table, collect the primary id of the table related to the solr index and fire a solr query by deleteby ID and pass the collected ids. This will remove the documents in the solr index. Thanks, SureshKumar.S From:

Re: eDismax parser and the mm parameter

2014-03-31 Thread S.L
Jack , Thanks a lot , I am now using the pf ,pf2 an pf3 and have gotten rid of the mm parameter from my queries, however for the fuzzy phrase queries , I am not sure how I would be able to leverage the Complex Query Parser there is absolutely nothing out there that gives me any idea as to how to

Re: Product index schema for solr

2014-03-31 Thread Ajay Patel
as per your suggestion my final schema will be like { id:unique_id ... ... [PRODUCT RELATED DATAS] ... ... ... min_qty: 1 max_qty: 50 price: 4 } [OTHER SAME LIKE ABOVE DATA] now i want to create range facet field by combing min_qty and

Solr indexing javabean

2014-03-31 Thread Prasi S
Hi, My solr document has a field is an xml. I am indexing the xml as such to solr and at runtime, i get the xml, parse it and display. Instead of xml, can we index that XML as a Java Bean. Thanks, Prasi

Re: solr 4.2.1 index gets slower over time

2014-03-31 Thread Dmitry Kan
Hi, We have noticed something like this as well, but with older versions of solr, 3.4. In our setup we delete documents pretty often. Internally in Lucene, when a document is client requested to be deleted, it is not physically deleted, but only marked as deleted. Our original optimization