Group by multiple fields

2013-06-06 Thread Benjamin Ryan
Hi, Is it possible to create a query similar in function to multiple SQL group by clauses? I have documents that have a single valued fields for host name and collection name and would like to group the results by both e.g. a result would contain a count of the

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

2013-06-06 Thread Prathik Puthran
My use case is I want to search for any substring of the indexed string and the Suggester should suggest the indexed string. What can I do to make this work? Thanks, Prathik On Thu, Jun 6, 2013 at 2:05 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Please excuse my misunderstanding,

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

2013-06-06 Thread Upayavira
Can you se the ShingleFilterFactory? It is ngrams for terms rather than characters. If you limited it to two term ngrams, when the user presses space after their first word, you could do a suggested query against your two term ngram field, which would suggest Jason Bourne, Jason Statham, etc then

SOLR CSV output in custom order

2013-06-06 Thread anurag.jain
I want output of csv file in proper order. when I use wt=csv it gives output in random order. Is there any way to get output in proper format. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-CSV-output-in-custom-order-tp4068527.html Sent from the Solr -

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

2013-06-06 Thread Prathik Puthran
This works even now i.e. when I search for Jas it suggests Jason Bourne. What I want is when I search for Bour or ason (any substring) it should suggest me Jason Bourne . On Thu, Jun 6, 2013 at 12:34 PM, Upayavira u...@odoko.co.uk wrote: Can you se the ShingleFilterFactory? It is ngrams for

Re: Solr: separating index and storage

2013-06-06 Thread Sourajit Basak
Absolutely. Solr will return the reference along the docs/results; those references may be used to look-up the actual stuff. Such use cases aren't hard to solve. If the use case demands returning the actual stuff alongside the results, it becomes non-trivial, especially during high loads. To

Filtering on results with more than N words.

2013-06-06 Thread Dotan Cohen
Is there any way to restrict the search results to only those documents with more than N words / tokens in the searched field? I thought that this would be an easy one to Google for, but I cannot figure it out. or find any references. There are many references to word size in characters, but not

Re: data-import problem

2013-06-06 Thread Stavros Delisavas
I tryed to deactivate the uniquekey, but that made solr not work at all. I got Error 500 for everything (no admin page, etc). So I had to reactivate it. This is my current configuration as you recommended. Unfortunatly still no improvement. The second table doesn't get recorded. I included

Re: Heap space problem with mlt query

2013-06-06 Thread Varsha Rani
Hi, As per suggestions , changed in my config file as : reduced document cache size from 31067 to 16384 and autowarmcount from 2046 to 1024. My machine RAM size is 16GB , 1 GB RAM used as index of 85GB started. my config file as : ramBufferSizeMB128/ramBufferSizeMB filterCache

Re: Heap space problem with mlt query

2013-06-06 Thread Stavros Delisavas
I recently had the same issue which could be fixed very easily. Add the property batchSize=-1 to your dataSource-tag. Tell me if that helped. Am 06.06.2013 11:30, schrieb Varsha Rani: Hi, As per suggestions , changed in my config file as : reduced document cache size from 31067 to 16384

Re: Solr 4.2.1 higher memory footprint vs Solr 3.5

2013-06-06 Thread Bernd Fehling
Am 05.06.2013 18:09, schrieb SandeepM: /So we see the jagged edge waveform which keeps climbing (GC cycles don't completely collect memory over time). Our test has a short capture from real traffic and we are replaying that via solrmeter./ Any idea why the memory climbs over time. The GC

Solr indexing slows down

2013-06-06 Thread Sebastian Steinfeld
Hi, I am new to solr and we want to use Solr to speed up our product search. And it is working really nice, but I think I have a problem with the indexing. It slows down after a few minutes. I am using the DataImportHandler to import the products from the database. And I start the import by

Re: Heap space problem with mlt query

2013-06-06 Thread Varsha Rani
Hi Stavros, I checked it with batchSize=-1, But still the same issue. As my single mlt query is : http://machine_ip:8983/solr/News/mlt?q=field1:34358471qt=/mltmlt.match.include=truemlt=truemlt.mindf=1mlt.mintf=1mlt.minwl=3mlt.boost=truefq=cat:News; AND date:[136644000 TO 1362827444000]

Re: copyField generates multiple values encountered for non multiValued field

2013-06-06 Thread Robert Krüger
On Wed, Jun 5, 2013 at 9:12 PM, Jack Krupansky j...@basetechnology.com wrote: Look in the Solr log - the error message should tell you what the multiple values are. For example, 95484 [qtp2998209-11] ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: ERROR:

Re: Search across multiple collections

2013-06-06 Thread Erick Erickson
You pretty much need to issue separate queries against each collection and creatively combine them. All of Solr's distributed search stuff pre-supposes two things 1 the schemas are very similar 2 the types of docs in each collection are also very similar. 2 is a bit subtle. If you store

Re: Filtering on results with more than N words.

2013-06-06 Thread Jack Krupansky
I don't recall seeing any such filter. Sounds like a good idea though. Although, maybe it is another good idea that really isn't too necessary for solving many real world problems. -- Jack Krupansky -Original Message- From: Dotan Cohen Sent: Thursday, June 06, 2013 3:45 AM To:

Re: copyField generates multiple values encountered for non multiValued field

2013-06-06 Thread Robert Krüger
I don't know what I have to do to use the atomic update feature but I am not aware of using it. But the way you describe it, it means that the copyField directive does not overwrite the existing field content and that's an easy explanation to what is happening in my case. Then the second update

Re: Group by multiple fields

2013-06-06 Thread Erick Erickson
There may be a terminology problem here. In Solr land, grouping aka field collapsing governs how the results are returned. But from your example, it looks like you really want summary counts rather than return documents grouped by some field. If you want counting, take a look at pivot faceting,

Re: SOLR CSV output in custom order

2013-06-06 Thread Erick Erickson
What happens if you include a sort clause? Warning, I've never tried it myself... Best Erick On Thu, Jun 6, 2013 at 3:11 AM, anurag.jain anurag.k...@gmail.com wrote: I want output of csv file in proper order. when I use wt=csv it gives output in random order. Is there any way to get output

Re: Solr: separating index and storage

2013-06-06 Thread Erick Erickson
By and large, stored fields are pretty irrelevant for resource consumption _except_ for disk space consumed. Sharded systems work fine, the stored data is stored in the index files (*.fdt and *.fdx) files in each segment on each shard. But you haven't told us anything about your data. How much

Re: copyField generates multiple values encountered for non multiValued field

2013-06-06 Thread Jack Krupansky
1. Try a simple curl command to add the document. 2. Check to see if maybe there is a duplicate copyField directive in your schema. How many copyField directives do you have? At least we know that it is exactly the same value duplicated and not some other value. -- Jack Krupansky

Re: Heap space problem with mlt query

2013-06-06 Thread Erick Erickson
Your cache sizes are still much too large. I wouldn't expect the changes you outlined to change anything. And your autowarm sizes are still far too big. The default sizes are 512 and 0 for size and autowarm counts. Try those. In fact, Solr will happily function (admittedly with slower queries) if

Auto-Suggest, spell check dictionary replication to slave issue

2013-06-06 Thread msreddy.hi
Hi All, We create 2 dictionary's from a indexed field for auto-sugest, spell check feature. When we configured replication from master to slave's index is replicating properly but not the auto-suggest, spell check dictionary's. Is there a way to replicate auto-suggest, spell check dictionary

Re: copyField generates multiple values encountered for non multiValued field

2013-06-06 Thread Jack Krupansky
read current state, manipulate fields and then add the document with the same id) Ahh... then you have an IMPLICIT reference to the field in your Java code - you explicitly told Solr that you wanted to start with all existing field values. Just because a field is the target of a copyField

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

2013-06-06 Thread Prathik Puthran
Basically I want the Suggester to return for Jason Bourne as suggestion for .*Bour.* regex. Thanks, Prathik On Thu, Jun 6, 2013 at 12:52 PM, Prathik Puthran prathik.puthra...@gmail.com wrote: This works even now i.e. when I search for Jas it suggests Jason Bourne. What I want is when I

AW: Heap space problem with mlt query

2013-06-06 Thread André Widhani
I am just reading through this thread by chance, but doesn't this exception: Caused by: org.apache.solr.common.SolrException: Error in xpath:/config/luceneMatchVersion for solrconfig.xml org.apache.solr.common.SolrException: Error in xpath:/config/luceneMatchVersion for solrconfig.xml

Re: Solr: separating index and storage

2013-06-06 Thread Sourajit Basak
Each day the index grows by ~250 MB; however I am anticipating that this growth will slow down because there will be repetitions (just a guess). Its not the order of growth but limitation of our infrastructure. Basically a budgetary constraint :-) Apparently there seems to be no problem than disk

Re: Schema Change: Int - String

2013-06-06 Thread Jack Krupansky
1. Generally, any schema change requires a full reindex. Sure, a lot of times you can squeak by, but with Solr and Lucene there are no guarantees. If it works for you, great. If not, don't complain - just reindex. And even if it does work for the current release, there is no guarantee that a

Re: Filtering on results with more than N words.

2013-06-06 Thread Walter Underwood
Someone else asked about this recently. The best approach is to count the words at index time and add a field with the count, so title and title_len or something like that. wunder On Jun 6, 2013, at 4:20 AM, Jack Krupansky wrote: I don't recall seeing any such filter. Sounds like a good idea

Download CSV, Strange thing is happening !!

2013-06-06 Thread anurag.jain
I have two field in solr, Named as 10th_mark, 12th_mark. Now I want to download that field in csv so i tried, http://localhost:8983/solr?q=*:*wt=csvstart=0rows=10fl=10th_mark,12th_mark But output is something like that, th_mark But If i put *th_mark it is giving me correct output.

Re: Filtering on results with more than N words.

2013-06-06 Thread Jack Krupansky
Yeah, but part of the problem is that an input string is not converted to words until analysis, which doesn't happen until after Solr creates the Lucene Document and hands it off to Lucene. In other words (Ha!Ha!), there are no words during the Solr-side of indexing. That said, you can always

Re: Download CSV, Strange thing is happening !!

2013-06-06 Thread Raymond Wiker
I think you'd be better off using field names that look like Java identifiers - e.g, mark10 instead of 10th_mark. Actually, let me rephrase that: you SHOULD be using field names that look like Java identifiers - less headache, all round. On Thu, Jun 6, 2013 at 4:01 PM, anurag.jain

Re: Solr 4.2.1 higher memory footprint vs Solr 3.5

2013-06-06 Thread Shawn Heisey
On 6/6/2013 3:50 AM, Bernd Fehling wrote: What helped me a lot was switching to G1GC. Faster, smoother, very little ripple, nearly no sawtooth. When I tried G1, it did indeed produce a better looking memory graph, but it didn't do anything about my GC pauses. They were several seconds with

Lucene Filter That Will Remove Some Tokens By Regex Pattern?

2013-06-06 Thread Furkan KAMACI
I want to use a core Lucene filter that will remove some tokens defined by a regex pattern. What is the appropriate class for it?

Re: copyField generates multiple values encountered for non multiValued field

2013-06-06 Thread Robert Krüger
On Thu, Jun 6, 2013 at 1:52 PM, Jack Krupansky j...@basetechnology.com wrote: read current state, manipulate fields and then add the document with the same id) Ahh... then you have an IMPLICIT reference to the field in your Java code - you explicitly told Solr that you wanted to start with

Re: Lucene Filter That Will Remove Some Tokens By Regex Pattern?

2013-06-06 Thread Walter Underwood
On Jun 6, 2013, at 7:24 AM, Furkan KAMACI wrote: I want to use a core Lucene filter that will remove some tokens defined by a regex pattern. What is the appropriate class for it? Use a pattern replace filter. That will give you zero-length tokens, which can cause odd matches. Follow it with a

Re: Download CSV, Strange thing is happening !!

2013-06-06 Thread Jack Krupansky
Yeah, Java-like identifiers are best. You should be able to wrap non-Java names in a field function: fl=field(10th_mark),field(12th_mark) -- Jack Krupansky -Original Message- From: Raymond Wiker Sent: Thursday, June 06, 2013 10:12 AM To: solr-user@lucene.apache.org Subject: Re:

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

2013-06-06 Thread Mikhail Khludnev
Got it. It's actually contrast to usual prefix suggestions. So, out-of-the box it's provided by http://wiki.apache.org/solr/TermsComponent terms.regex= also see last example there it should works by loading terms in memory and linearly scanning them with regexp. There is nothing more efficient

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

2013-06-06 Thread Walter Underwood
Let's clear up some things about how Solr works. 1. Solr matches individual words, not the whole text. So Jason Bourne is split into [Jason, Bourne]. The leading .* in your pattern does not match preceding words, it would match the beginning of a single word. 2. Query time wildcards test every

Re: Download CSV, Strange thing is happening !!

2013-06-06 Thread anurag.jain
fl=field(10th_mark),field(12th_mark) if I use wt=csv, It is giving me No output, when wt=json it is giving me output. -- View this message in context: http://lucene.472066.n3.nabble.com/Download-CSV-Strange-thing-is-happening-tp4068599p4068633.html Sent from the Solr - User mailing list

Re: data-import problem

2013-06-06 Thread bbarani
The below error clearly says that you have declared a unique id but that unique id is missing for some documents. org.apache.solr.common.SolrException: [doc=null] missing required field: nameid This is mainly because you are just trying to import 2 tables in to a document without any

Re: Group by multiple fields

2013-06-06 Thread bbarani
Not sure if this solution will work for you but this is what I did to implement nested grouping using SOLR 3.X. Simple idea behind is to Concatenate 2 fields and index them in to single field and group on that field..

Re: Filtering on results with more than N words.

2013-06-06 Thread Walter Underwood
I was thinking of counting the words before the field is indexed. It is quite possible that splitting on white space would be sufficient. Of course, some idea of what problem this is supposed to solve would be very helpful. wunder On Jun 6, 2013, at 7:07 AM, Jack Krupansky wrote: Yeah, but

Re: data-import problem

2013-06-06 Thread Stavros Delisavas
It's surprising to me that all tables have to have a relationship in order to be used in solr. What if I have two indipendent projects running on the same webserver? I would not be able to use Solr for both of them, really? That would be very dissappointing... Anyway, luckily there is an

Re: Solr indexing slows down

2013-06-06 Thread Michael Della Bitta
Hi Sebastian, What database are you using? How much RAM is available on your machine? It looks like you're selecting from a view... Have you tried paging through the view outside of Solr? Does that slow down as well? Do you notice any increased load on the Solr box or the database server?

Re: data-import problem

2013-06-06 Thread Walter Underwood
When designing for Solr (or most search engines), think in terms of documents, not tables. What do your search results look like? You will want one document for each search result. The document will have stored fields for each thing displayed and indexed fields for each thing searched. If you

Re: Images in the Solr Wiki

2013-06-06 Thread Chris Hostetter
: Request to infra filed... : : https://issues.apache.org/jira/browse/INFRA-6345 FYI: Fixed. -Hoss

Re: data-import problem

2013-06-06 Thread bbarani
You don't really need to have a relationship but the unique id should be unique in a document. I had mentioned about the relationship due to the fact that the unique key was present only in one table but not the other.. Check out this link for more information on importing multiple table data.

Re: Solr indexing slows down

2013-06-06 Thread Shawn Heisey
On 6/6/2013 4:13 AM, Sebastian Steinfeld wrote: The amout of documents I want to index is 8 million, the first 1,6 million are indexed in 2min, but to complete the Import it takes nearly 2 hours. The size of the index on the hard drive is 610MB. I started the solr server with 2GB memory. I

Re: data-import problem

2013-06-06 Thread Stavros Delisavas
Unfortunatly my two tables do not share a unique key. they both have integers as keys starting with number 1. Is there any way to overcome this problem? Removing the uniquekey-property from my schema.xml leads to solr not working (I have tryed that already). The link you provided is showing

Re: Images in the Solr Wiki

2013-06-06 Thread Michael Della Bitta
Thanks a lot for your help! Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w:

Re: data-import problem

2013-06-06 Thread Shawn Heisey
On 6/6/2013 11:15 AM, Stavros Delisavas wrote: Unfortunatly my two tables do not share a unique key. they both have integers as keys starting with number 1. Is there any way to overcome this problem? Removing the uniquekey-property from my schema.xml leads to solr not working (I have tryed that

Re: data-import problem

2013-06-06 Thread Stavros Delisavas
Perfect! This finally worked! Shawn, thank you a lot! How do I set up multiple cores? Again, thank you so much! I was looking for a solution for days! Am 06.06.2013 19:23, schrieb Shawn Heisey: On 6/6/2013 11:15 AM, Stavros Delisavas wrote: Unfortunatly my two tables do not share a unique

Re: data-import problem

2013-06-06 Thread Shawn Heisey
On 6/6/2013 11:38 AM, Stavros Delisavas wrote: Perfect! This finally worked! Shawn, thank you a lot! How do I set up multiple cores? Again, thank you so much! I was looking for a solution for days! Cores are defined in solr.xml - the default example core is named collection1. I am

Re: data-import problem

2013-06-06 Thread bbarani
Not sure if I understand your situation..I am not sure how would you relate the data between 2 tables if theres no relationship? You are trying to just dump random values from 2 tables in to a document?ConsiderTable1: Name idpeter 1john2mike 3Table2:Title TitleIdCEO

Re: data-import problem

2013-06-06 Thread Stavros Delisavas
Think about movies and the cast of a movie. There are movies (title) which have their unique ids. And there are many people (name) like the producer, actors, etc which have their unique ids. But there are ppl who have been actor in more than one movie. Thats why i have a third table which

Re: data-import problem

2013-06-06 Thread Stavros Delisavas
Thats okay. For now, I guess it is okay. Finally I could import all 6.6 million entries successfully. I am happy. Am 06.06.2013 19:44, schrieb Shawn Heisey: On 6/6/2013 11:38 AM, Stavros Delisavas wrote: Perfect! This finally worked! Shawn, thank you a lot! How do I set up multiple cores?

OutOfMemory while indexing (PROD environment!)

2013-06-06 Thread Isaac Hebsh
Hi everyone, My SolrCloud cluster (4.3.0) has came into production a few days ago. Docs are being indexed into Solr using /update requestHandler, as a POST request, containing text/xml content-type. The collection is sharded into 36 pieces, each shard has two replicas. There are 36 nodes (each

OR query with null value and non-null value(s)

2013-06-06 Thread Rahul R
I have recently enabled facet.missing=true in solrconfig.xml which gives null facet values also. As I understand it, the syntax to do a faceted search on a null value is something like this: fq=-price:[* TO *] So when I want to search on a particular value (for example : 4) OR null value, I would

Re: OR query with null value and non-null value(s)

2013-06-06 Thread Shawn Heisey
On 6/6/2013 12:28 PM, Rahul R wrote: I have recently enabled facet.missing=true in solrconfig.xml which gives null facet values also. As I understand it, the syntax to do a faceted search on a null value is something like this: fq=-price:[* TO *] So when I want to search on a particular value

new xslt

2013-06-06 Thread Christopher Gross
In 3.x Solr (and earlier) I was able to create a new xslt doc in the conf/xslt directory and immediately start using it. In my 4.1 setup, I have: queryResponseWriter name=xslt class=solr.XSLTResponseWriter int name=xsltCacheLifetimeSeconds5/int /queryResponseWriter But after that small

LotsOfCores feature

2013-06-06 Thread Aleksey
I was looking at this wiki and linked issues: http://wiki.apache.org/solr/LotsOfCores they talk about a limit being 100K cores. Is that per server or per entire fleet because zookeeper needs to manage that? I was considering a use case where I have tens of millions of indices but less that a

Re: Solr: separating index and storage

2013-06-06 Thread Erick Erickson
bq: I am anticipating that this growth will slow down because there will be repetitions This will be true for your indexed data, but NOT for your stored data. Each stored field is stored as-is per document. It'll be compressed, so won't take up the entire 250M, but it'll still be stored. FWIW,

Re: LotsOfCores feature

2013-06-06 Thread Erick Erickson
100K is really not the limit, it's just hard to imagine 100K cores on a single machine unless some were really rarely used. And it's per node, not cluster-wide. The current state is that everything is in place, including transient cores, auto-discovery, etc. So you should be able to go ahead and

Request to be added to ContributorsGroup

2013-06-06 Thread Josh Lincoln
Hello Wiki Admins, I have been using Solr for a few years now and I would like to contribute back by making minor changes and clarifications to the wiki documentation. Wiki User Name : JoshLincoln Thanks

Re: Request to be added to ContributorsGroup

2013-06-06 Thread Erick Erickson
Done, thanks! On Thu, Jun 6, 2013 at 3:47 PM, Josh Lincoln josh.linc...@gmail.com wrote: Hello Wiki Admins, I have been using Solr for a few years now and I would like to contribute back by making minor changes and clarifications to the wiki documentation. Wiki User Name : JoshLincoln

Solr 4.1 over Websphere errors

2013-06-06 Thread abillavara
hi all We are having a problem getting Solr4.1 (Solr 4.3 is also not starting) to run in Websphere on Windows. Websphere version? [8.0.0.3] Windows version? [Win7 64bit] Solr version? [4.1] JDK version? [1.7.0_13 64bit] Here is the error that none of us have ever seen before. Can somebody

Re: new xslt

2013-06-06 Thread Upayavira
On Thu, Jun 6, 2013, at 07:54 PM, Christopher Gross wrote: In 3.x Solr (and earlier) I was able to create a new xslt doc in the conf/xslt directory and immediately start using it. In my 4.1 setup, I have: queryResponseWriter name=xslt class=solr.XSLTResponseWriter int

nutch 1.4, solr 3.4 configuration error

2013-06-06 Thread Isaac Stennett
I am trying to configure nutch 1.4 with solr 3.4. I configured everything and when I run the command: ./nutch crawl urls -dir myCrawl2 -solr http://localhost:8080 -depth 2 -topN 2 I get the following error: java.io.IOException: Job failed! SolrDeleteDuplicates: starting at 2013-06-06 15:49:30

Re: nutch 1.4, solr 3.4 configuration error

2013-06-06 Thread bbarani
can you check if you have correct solrj client library version in both nutch and Solr server. -- View this message in context: http://lucene.472066.n3.nabble.com/nutch-1-4-solr-3-4-configuration-error-tp4068724p4068733.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.1 over Websphere errors

2013-06-06 Thread Shawn Heisey
On 6/6/2013 1:57 PM, abillav...@innoventsolutions.com wrote: We are having a problem getting Solr4.1 (Solr 4.3 is also not starting) to run in Websphere on Windows. Websphere version? [8.0.0.3] Windows version? [Win7 64bit] Solr version? [4.1] JDK version? [1.7.0_13 64bit] Based on seeing

Re: Solr 4.1 over Websphere errors

2013-06-06 Thread bbarani
As suggested by Shawn try to change the JVM, this might resolve your issue. I had seen this error ':java.lang.VerifyError' before (not specific to SOLR) when compiling code using JDK1.7. After some research I figured out the code compiled using Java 1.7 requires stack map frame instructions. If

Re: Solr 4.1 over Websphere errors

2013-06-06 Thread Chris Hostetter
: Based on seeing java.lang.J9VMInternals in your log, I am guessing that your : JVM is IBM's J9, not Oracle. The Java from IBM is notoriously buggy when it : comes to running Lucene and Solr. Try Oracle, version 1.7.0_21. Note specifically the excellent verbage here...

Re: Auto-Suggest, spell check dictionary replication to slave issue

2013-06-06 Thread bbarani
Seems like this feature is still yet to be implemented.. https://issues.apache.org/jira/browse/SOLR-866 -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Suggest-spell-check-dictionary-replication-to-slave-issue-tp4068562p4068739.html Sent from the Solr - User mailing

Re: nutch 1.4, solr 3.4 configuration error

2013-06-06 Thread Chris Hostetter
: ./nutch crawl urls -dir myCrawl2 -solr http://localhost:8080 -depth 2 -topN ... : Caused by: org.apache.solr.common.SolrException: Not Found : : Not Found : : request: http://localhost:8080/select?q=id:[* TO : *]fl=idrows=1wt=javabinversion=2 ... : Other possibly helpful

Re: Solr 4.1 over Websphere errors

2013-06-06 Thread Anria
Thank you This sure is a lot to chew on -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-1-over-Websphere-errors-tp4068715p4068740.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: OutOfMemory while indexing (PROD environment!)

2013-06-06 Thread Otis Gospodnetic
Hi, Try running jstat to see if the heap is full. 4gb is not much and could easily be eaten by structures used for sorting, facetting, and caching. Plug: SPM has a new feature that lets you send graphs with various metrics to Solr mailing list. I'd personally look at the GC graphs to see if GC

Re: LotsOfCores feature

2013-06-06 Thread Erick Erickson
Now Jack. You know it depends G Just answer the questions how many simultaneous cores can you open on your hardware, and what's the maximum percentage of the cores you expect to be open at any one time. Do some math and you have your answer. The meta-data, essentially anything in the core

Re: Filtering on results with more than N words.

2013-06-06 Thread Jack Krupansky
From the book, here's an update request processor chain which will count the words in the content field and place it in the content_len_I field. Then you could do a range query on that count. updateRequestProcessorChain name=regex-count-words !-- Start with a copy of the content field --

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

2013-06-06 Thread Otis Gospodnetic
Hi Ngrams *will* do this for you. Otis Solr ElasticSearch Support http://sematext.com/ On Jun 6, 2013 7:53 AM, Prathik Puthran prathik.puthra...@gmail.com wrote: Basically I want the Suggester to return for Jason Bourne as suggestion for .*Bour.* regex. Thanks, Prathik On Thu, Jun 6,

[blogpost] Memory is overrated, use SSDs

2013-06-06 Thread Toke Eskildsen
Inspired by multiple Solr mailing list entries during the last month or two, I did some search performance testing on our 11M documents / 49GB index using logged queries on Solr 4 with MMapDirectory. It turns out that our setup with Solid State Drives and 8GB of RAM (which leaves 5GB for disk

Re: LotsOfCores feature

2013-06-06 Thread Aleksey
I would not try putting tens of millions of cores on one machine. My question (and I think Jack's as well) was around having them across a fleet, say if I need 1M then I'd get 100 machines appropriately sized for 10K each. I was clarifying because there was some talk about ZooKeeper only being

Re: [blogpost] Memory is overrated, use SSDs

2013-06-06 Thread Shawn Heisey
Inspired by multiple Solr mailing list entries during the last month or two, I did some search performance testing on our 11M documents / 49GB index using logged queries on Solr 4 with MMapDirectory. It turns out that our setup with Solid State Drives and 8GB of RAM (which leaves 5GB for disk

Re: LotsOfCores feature

2013-06-06 Thread Jack Krupansky
I'm glad Erick finally answered my question (I think I actually asked it on the original Jira) concerning the rough magnitude of Lots - it's hundreds/thousands, but not hundreds of thousands, millions, or tens of millions. So, if an app needs millions, I think that suggests a MegaCores

RE: [blogpost] Memory is overrated, use SSDs

2013-06-06 Thread Toke Eskildsen
Shawn Heisey [s...@elyograg.org]: This is awesome! Concrete info is better than speculation. Thank you. I think it might be time to split the SSD section of SolrPerformanceProblems into its own wiki page and expand it. That might be a good idea. It would also be interesting to try and

HdfsDirectoryFactory

2013-06-06 Thread Jamie Johnson
I've seen reference to an HdfsDirectoryFactory in the new Cloudera Search along with a commit in the Solr SVN ( http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/solrconfig-tlog.xml?view=markup), is this something that is being made part of the core?

Re: Can't find solr.xml

2013-06-06 Thread Anria
Nabeel, I just want to say, that though this post is very old, in the entire internet of this error, your suggestion of moving out of /home/user/solr into /opt/solr was the one that worked for me too Thank you! Anria -- View this message in context:

Re: Schema Change: Int - String (i am the original poster, new email address)

2013-06-06 Thread z z
3. Too hard to say from the way you have described it. Show us some sample input. Jack, Here you go. *Row X* column1: data here column2: more data here ... user_id: 2002 *Row Y* column1: data here column2: more data here ... user_id: 45 *Row Z* column1: data here column2: more data here ...

Re: Schema Change: Int - String (i am the original poster, new email address)

2013-06-06 Thread z z
I want to query against one user_id in the string. eg user_id:2002+AND+created:[${from}+TO+${until}]+data:more So all of the records with a 2002 in user_id need to be returned and only those records. If this can only be guaranteed by having user_id be an integer, then that is fine, but I would

Re: Schema Change: Int - String (i am the original poster, new email address)

2013-06-06 Thread Jack Krupansky
Okay, now, how about a few queries that you want to use? Do you want to query by parts of the user ID, or only by the whole (exact) value? If the user ID will be a string, fine, but having spaces makes it a little more painful to enter in a query - maybe use dashes. -- Jack Krupansky

Re: Schema Change: Int - String (i am the original poster, new email address)

2013-06-06 Thread z z
eg user_id:2002+AND+created:[${from}+TO+${until}]+data:more Expected results: return row XYZ but ignore this row: column1: data here column2: more data here ... user_id: 45 15001 45664 *Row X* column1: data here column2: more data here ... user_id: 2002 *Row Y* column1: data here

Re: Schema Change: Int - String (i am the original poster, new email address)

2013-06-06 Thread Jack Krupansky
In that case, you will need to keep two copies of the user ID, one which is a single, complete string, and one which is a tokenized field text/TextField so that you can do a keyword search against it. Use the string/StrField as the main copy and then use a copyField directive in the schema to

Re: Schema Change: Int - String (i am the original poster, new email address)

2013-06-06 Thread z z
The unique key is an auto-incremented int in the db. Sorry for having given the impression that user_id is the unique key per document. This is a table of events that are happening as users interact with our system. It just so happens that we were inserting individual records for each user

Re: LotsOfCores feature

2013-06-06 Thread Shawn Heisey
On 6/6/2013 6:32 PM, Jack Krupansky wrote: big snip This would be a lot more of a true Solr Cloud than the cluster support that we have today. And the CloudKeeper itself might be a traditional SolrCloud cluster, except that it needs to be multi-data center. I like a lot of what you said

Re: Schema Change: Int - String (i am the original poster, new email address)

2013-06-06 Thread Jack Krupansky
To be clear, one normally doesn't do queries on portions of an ID - usually it is one integrated string. Further strings are definitely NOT tokenized in Solr. Your story keeps changing, which is why I have to keep hedging my answers. At least with your latest store, your user_id should be a

Re: Schema Change: Int - String (i am the original poster, new email address)

2013-06-06 Thread z z
My language might be a bit off (I am saying string when I probably mean text in the context of solr), but I'm pretty sure that my story is unwavering ;) `id` int(11) NOT NULL AUTO_INCREMENT `created` int(10) `data` varbinary(255) `user_id` int(11) So, imagine that we have 1000 entries come in

Re: [blogpost] Memory is overrated, use SSDs

2013-06-06 Thread Andy
This is very interesting. Thanks for sharing the benchmark. One question I have is did you precondition the SSD ( http://www.sandforce.com/userfiles/file/downloads/FMS2009_F2A_Smith.pdf )? SSD performance tends to take a very deep dive once all blocks are written at least once and the garbage

Re: OR query with null value and non-null value(s)

2013-06-06 Thread Rahul R
Thank you Shawn. This does work. To help me understand better, why do we need the *:* ? Shouldn't it be implicit ? Shouldn't fq=(price:4+OR+(-price:[* TO *])) //does not work mean the same as fq=(price:4+OR+(*:* -price:[* TO *])) //works Why does Solr need the *:* there ? On Fri, Jun

Re: OR query with null value and non-null value(s)

2013-06-06 Thread Shawn Heisey
On 6/6/2013 11:21 PM, Rahul R wrote: Thank you Shawn. This does work. To help me understand better, why do we need the *:* ? Shouldn't it be implicit ? Shouldn't fq=(price:4+OR+(-price:[* TO *])) //does not work mean the same as fq=(price:4+OR+(*:* -price:[* TO *])) //works Why