TimeAllowed bug

2015-08-24 Thread Bill Bell
Weird fq caching bug when using timeAllowed

Find a pwid (in this case YLGVQ)
Run a query w/ a FQ on the pwid and timeAllowed=1.
http://hgsolr2devsl.healthgrades.com:8983/solr/providersearch/select/?q=*:*wt=jsonfl=pwidfq=pwid:YLGVQtimeAllowed=1
Ensure #2 returns 0 results
Rerun the query without the timeAllowed param.
http://hgsolr2devsl.healthgrades.com:8983/solr/providersearch/select/?q=*:*wt=jsonfl=pwidfq=pwid:YLGVQ
Note that after removing the timeAllowed parameter the query is still returning 
0 results.

 Solr seems to be caching the FQ when the timeAllowed parameter is present.


Bill Bell
Sent from mobile



Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Bill Bell
We use 8gb to 10gb for those size indexes all the time.


Bill Bell
Sent from mobile


 On Aug 23, 2015, at 8:52 AM, Shawn Heisey apa...@elyograg.org wrote:
 
 On 8/22/2015 10:28 PM, Zheng Lin Edwin Yeo wrote:
 Hi Shawn,
 
 Yes, I've increased the heap size to 4GB already, and I'm using a machine
 with 32GB RAM.
 
 Is it recommended to further increase the heap size to like 8GB or 16GB?
 
 Probably not, but I know nothing about your data.  How many Solr docs
 were created by indexing 1GB of data?  How much disk space is used by
 your Solr index(es)?
 
 I know very little about clustering, but it looks like you've gotten a
 reply from Toke, who knows a lot more about that part of the code than I do.
 
 Thanks,
 Shawn
 


Re: solr multicore vs sharding vs 1 big collection

2015-08-03 Thread Bill Bell
Yeah a separate by month or year is good and can really help in this case.

Bill Bell
Sent from mobile


 On Aug 2, 2015, at 5:29 PM, Jay Potharaju jspothar...@gmail.com wrote:
 
 Shawn,
 Thanks for the feedback. I agree that increasing timeout might alleviate
 the timeout issue. The main problem with increasing timeout is the
 detrimental effect it will have on the user experience, therefore can't
 increase it.
 I have looked at the queries that threw errors, next time I try it
 everything seems to work fine. Not sure how to reproduce the error.
 My concern with increasing the memory to 32GB is what happens when the
 index size grows over the next few months.
 One of the other solutions I have been thinking about is to rebuild
 index(weekly) and create a new collection and use it. Are there any good
 references for doing that?
 Thanks
 Jay
 
 On Sun, Aug 2, 2015 at 10:19 AM, Shawn Heisey apa...@elyograg.org wrote:
 
 On 8/2/2015 8:29 AM, Jay Potharaju wrote:
 The document contains around 30 fields and have stored set to true for
 almost 15 of them. And these stored fields are queried and updated all
 the
 time. You will notice that the deleted documents is almost 30% of the
 docs.  And it has stayed around that percent and has not come down.
 I did try optimize but that was disruptive as it caused search errors.
 I have been playing with merge factor to see if that helps with deleted
 documents or not. It is currently set to 5.
 
 The server has 24 GB of memory out of which memory consumption is around
 23
 GB normally and the jvm is set to 6 GB. And have noticed that the
 available
 memory on the server goes to 100 MB at times during a day.
 All the updates are run through DIH.
 
 Using all availble memory is completely normal operation for ANY
 operating system.  If you hold up Windows as an example of one that
 doesn't ... it lies to you about available memory.  All modern
 operating systems will utilize memory that is not explicitly allocated
 for the OS disk cache.
 
 The disk cache will instantly give up any of the memory it is using for
 programs that request it.  Linux doesn't try to hide the disk cache from
 you, but older versions of Windows do.  In the newer versions of Windows
 that have the Resource Monitor, you can go there to see the actual
 memory usage including the cache.
 
 Every day at least once i see the following error, which result in search
 errors on the front end of the site.
 
 ERROR org.apache.solr.servlet.SolrDispatchFilter -
 null:org.eclipse.jetty.io.EofException
 
 From what I have read these are mainly due to timeout and my timeout is
 set
 to 30 seconds and cant set it to a higher number. I was thinking maybe
 due
 to high memory usage, sometimes it leads to bad performance/errors.
 
 Although this error can be caused by timeouts, it has a specific
 meaning.  It means that the client disconnected before Solr responded to
 the request, so when Solr tried to respond (through jetty), it found a
 closed TCP connection.
 
 Client timeouts need to either be completely removed, or set to a value
 much longer than any request will take.  Five minutes is a good starting
 value.
 
 If all your client timeout is set to 30 seconds and you are seeing
 EofExceptions, that means that your requests are taking longer than 30
 seconds, and you likely have some performance issues.  It's also
 possible that some of your client timeouts are set a lot shorter than 30
 seconds.
 
 My objective is to stop the errors, adding more memory to the server is
 not
 a good scaling strategy. That is why i was thinking maybe there is a
 issue
 with the way things are set up and need to be revisited.
 
 You're right that adding more memory to the servers is not a good
 scaling strategy for the general case ... but in this situation, I think
 it might be prudent.  For your index and heap sizes, I would want the
 company to pay for at least 32GB of RAM.
 
 Having said that ... I've seen Solr installs work well with a LOT less
 memory than the ideal.  I don't know that adding more memory is
 necessary, unless your system (CPU, storage, and memory speeds) is
 particularly slow.  Based on your document count and index size, your
 documents are quite small, so I think your memory size is probably good
 -- if the CPU, memory bus, and storage are very fast.  If one or more of
 those subsystems aren't fast, then make up the difference with lots of
 memory.
 
 Some light reading, where you will learn why I think 32GB is an ideal
 memory size for your system:
 
 https://wiki.apache.org/solr/SolrPerformanceProblems
 
 It is possible that your 6GB heap is not quite big enough for good
 performance, or that your GC is not well-tuned.  These topics are also
 discussed on that wiki page.  If you increase your heap size, then the
 likelihood of needing more memory in the system becomes greater, because
 there will be less memory available for the disk cache.
 
 Thanks,
 Shawn
 
 
 -- 
 Thanks
 Jay Potharaju


Re: Division with Stats Component when Grouping in Solr

2015-06-13 Thread Bill Bell
It would be cool to be able to set 2 group by with facets 

 GROUP BY
site_id, keyword


Bill Bell
Sent from mobile


On Jun 13, 2015, at 2:28 PM, Yonik Seeley ysee...@gmail.com wrote:

 GROUP BY
site_id, keyword


Re: Facet

2015-04-05 Thread Bill Bell
Ok

Clarification

The limit is set to -1. But the average result is 300. 

The amount of strings stored in the field increased a lot. Like 250k to 350k. 
But the amount coming out is limited by facet.prefix. 

Would creating 900 fields be better ? Then I could just put the prefix in the 
field name. Like this: proc_ps122

Thoughts ?

So far I heard solcloud, docvalues as viable solutions. Stay away from enum.

Bill Bell
Sent from mobile


 On Apr 5, 2015, at 2:56 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote:
 
 William Bell billnb...@gmail.com wrote:
 Sent: 05 April 2015 06:20
 To: solr-user@lucene.apache.org
 Subject: Facet
 
 We increased our number of terms (String) in a facet by 50,000.
 
 Do you mean facet.limit=5?
 
 Now we are getting an error when we facet by this field - so we switched it 
 to
 facet.method=enum, and now the results come back. However, when we put
 it into production we literally hit a wall (CPU went to 100% for 16 cores)
 after about 30 minutes live.
 
 It was strange that enum worked. Internally, the difference between 
 facet.limit=100 and facet.limit=5 is quite small. The real hits are for 
 fine-counting within SolrCloud and serializing the result in order to deliver 
 it to the client. I thought enum behaved the same as fc with regard to those 
 two.
 
 We tried adding more machines to reduce the CPU, but it did not help.
 
 Sounds like SolrCloud. More machines does not help here, it might even be 
 worse. What happens is that distributed faceting is two-phase, where the 
 second phase is fine-counting. The fine-counting essentially makes all shards 
 perform micro-searches for a large part of the terms returned: Your shards 
 are bogged down by tens of thousands of small searches.
 
 If you are feeling adventurous, you can try putting
 http://tokee.github.io/lucene-solr/
 on a test-installation (I am the author). It changes the way the 
 fine-counting is done.
 
 
 Depending on your container, you might need to raise the internal limits for 
 GET-communication. Tomcat has a default of 2MB somewhere (sorry, don't 
 remember the details), which is not a lot for 50,000 values.
 
 What are some ideas? We are going to try docValues on the field. Does
 anyone know if method=fc or method=enum works for docValue? I cannot find
 any documentation on that.
 
 If DocValues are enabled, fc will use them. It does not change anything for 
 enum. But I would argue against enum for anything in the thousands anyway.
 
 We are thinking of splitting the field into 2 fields (fielda, fieldb). At
 least the number will be less, but not sure if it will help memory?
 
 The killer is the number of terms requested/returned.
 
 The weird thing is for the first 30 minutes things are performing great.
 Literally at like 10% CPU across 16 cores, not much memory and normal GC.
 
 It might be because you have just been lucky. Take a look at
 https://twitter.com/anjacks0n/status/509284768035262464
 for how different performance can be for different result set sizes.
 
 Originally the facet was a method=fc. Is there an issue with enum? We have
 facet.threads=20 set, and not sure this is wise for a enum ?
 
 Facet threading does not thread within each field, it just means that 
 multiple fields are processed in parallel.
 
 - Toke Eskildsen


Re: ZFS File System for SOLR 3.6 and SOLR 4

2015-03-28 Thread Bill Bell
Is the an advantage for Xfs over ext4 for Solr ? Anyone done testing?

Bill Bell
Sent from mobile


 On Mar 27, 2015, at 8:14 AM, Shawn Heisey apa...@elyograg.org wrote:
 
 On 3/27/2015 12:30 AM, abhi Abhishek wrote:
 i am trying to use ZFS as filesystem for my Linux Environment. are
 there any performance implications of using any filesystem other than
 ext-3/ext-4 with SOLR?
 
 That should work with no problem.
 
 The only time Solr tends to have problems is if you try to use a network
 filesystem.  As long as it's a local filesystem and it implements
 everything a program can typically expect from a local filesystem, Solr
 should work perfectly.
 
 Because of the compatibility problems that the license for ZFS has with
 the GPL, ZFS on Linux is probably not as well tested as other
 filesystems like ext4, xfs, or btrfs, but I have not heard about any big
 problems, so it's probably safe.
 
 Thanks,
 Shawn
 


Re: How to boost documents at index time?

2015-03-28 Thread Bill Bell
Issue a Jura ticket ?

Did you try debugQuery ?

Bill Bell
Sent from mobile


 On Mar 28, 2015, at 1:49 AM, CKReddy Bhimavarapu chaitu...@gmail.com wrote:
 
 I am want to boost docs at index time, I am doing this using boost
 parameter in doc field doc boost=2.0.
 but I can't see direct impact on the  doc by using  debuQuery.
 
 My question is that is there any other way to boost doc at index time and
 can see the reflected changes i.e direct impact.
 
 -- 
 ckreddybh. chaitu...@gmail.com


Re: Sort on multivalued attributes

2015-02-09 Thread Bill Bell
Definitely needed !!

Bill Bell
Sent from mobile


 On Feb 9, 2015, at 5:51 AM, Jan Høydahl jan@cominvent.com wrote:
 
 Sure, vote for it. Number of votes do not directly make prioritized sooner.
 So you better also add a comment to the JIRA, it will raise committer's 
 attention.
 Even better of course is if you are able to help bring the issue forward by 
 submitting patches.
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 
 9. feb. 2015 kl. 12.15 skrev Flavio Pompermaier pomperma...@okkam.it:
 
 Do I have to vote for it..?
 
 On Mon, Feb 9, 2015 at 11:50 AM, Jan Høydahl jan@cominvent.com wrote:
 
 See https://issues.apache.org/jira/browse/SOLR-2522
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 
 9. feb. 2015 kl. 10.30 skrev Flavio Pompermaier pomperma...@okkam.it:
 
 In my use case it could be very helpful because I use the SIREn plugin to
 index arbitrary JSON-LD and this plugin automatically index also all
 nested
 attributes as a Solr field.
 Thus I need for example to gather all entries with a certain value of the
 type attribute, ordered by name (but name could be a multivalued
 attribute in my use case :( )
 I'd like to avoid to switch to Elasticsearch just to have this single
 feature.
 
 Thanks for the support,
 Flavio
 
 On Mon, Feb 9, 2015 at 10:02 AM, Anshum Gupta ans...@anshumgupta.net
 wrote:
 
 Sure, that's correct and makes sense in some use cases. I'll need to
 check
 if Solr functions support such a thing.
 
 On Mon, Feb 9, 2015 at 12:47 AM, Flavio Pompermaier 
 pomperma...@okkam.it
 wrote:
 
 I saw that this is possible in Lucene (
 https://issues.apache.org/jira/browse/LUCENE-5454) and also in
 Elasticsearch. Or am I wrong?
 
 On Mon, Feb 9, 2015 at 9:05 AM, Anshum Gupta ans...@anshumgupta.net
 wrote:
 
 Unless I'm missing something here, sorting on a multi-valued field
 would
 be
 non-deterministic in nature.
 
 On Sun, Feb 8, 2015 at 11:59 PM, Flavio Pompermaier 
 pomperma...@okkam.it
 wrote:
 
 Hi to all,
 
 Is there any possibility that in the near future Solr could support
 sorting
 on multivalued fields?
 
 Best,
 Flavio
 
 
 
 --
 Anshum Gupta
 http://about.me/anshumgupta
 
 
 
 --
 Anshum Gupta
 http://about.me/anshumgupta
 


Re: Collations are not working fine.

2015-02-09 Thread Bill Bell
Can you order the collation a by highest to lowest hits ?

Bill Bell
Sent from mobile


 On Feb 9, 2015, at 6:47 AM, Nitin Solanki nitinml...@gmail.com wrote:
 
 I am working on spell checking in Solr. I have implemented Suggestions and
 collations in my spell checker component.
 
 Most of the time collations work fine but in few case it fails.
 
 *Working*:
 I tried query:*gone wthh thes wnd*: In this wnd doesn't give suggestion
 wind but collation is coming right = gone with the wind, hits = 117
 
 
 *Not working:*
 But when I tried query: *gone wthh thes wint*: In this wint does give
 suggestion wind but collation is not coming right. Instead of gone with
 the wind it gives gone with the west, hits = 1.
 
 And I want to also know what is *hits* in collations.


Re: How large is your solr index?

2015-01-03 Thread Bill Bell
For Solr 5 why don't we switch it to 64 bit ??

Bill Bell
Sent from mobile


 On Dec 29, 2014, at 1:53 PM, Jack Krupansky jack.krupan...@gmail.com wrote:
 
 And that Lucene index document limit includes deleted and updated
 documents, so even if your actual document count stays under 2^31-1,
 deleting and updating documents can push the apparent document count over
 the limit unless you very aggressively merge segments to expunge deleted
 documents.
 
 -- Jack Krupansky
 
 -- Jack Krupansky
 
 On Mon, Dec 29, 2014 at 12:54 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
 When you say 2B docs on a single Solr instance, are you talking only one
 shard?
 Because if you are, you're very close to the absolute upper limit of a
 shard, internally
 the doc id is an int or 2^31. 2^31 + 1 will cause all sorts of problems.
 
 But yeah, your 100B documents are going to use up a lot of servers...
 
 Best,
 Erick
 
 On Mon, Dec 29, 2014 at 7:24 AM, Bram Van Dam bram.van...@intix.eu
 wrote:
 Hi folks,
 
 I'm trying to get a feel of how large Solr can grow without slowing down
 too
 much. We're looking into a use-case with up to 100 billion documents
 (SolrCloud), and we're a little afraid that we'll end up requiring 100
 servers to pull it off.
 
 The largest index we currently have is ~2billion documents in a single
 Solr
 instance. Documents are smallish (5k each) and we have ~50 fields in the
 schema, with an index size of about 2TB. Performance is mostly OK. Cold
 searchers take a while, but most queries are alright after warming up. I
 wish I could provide more statistics, but I only have very limited
 access to
 the data (...banks...).
 
 I'd very grateful to anyone sharing statistics, especially on the larger
 end
 of the spectrum -- with or without SolrCloud.
 
 Thanks,
 
 - Bram
 


Re: Old facet value doesn't go away after index update

2014-12-19 Thread Bill Bell
Set mincount=1

Bill Bell
Sent from mobile


 On Dec 19, 2014, at 12:22 PM, Tang, Rebecca rebecca.t...@ucsf.edu wrote:
 
 Hi there,
 
 I have an index that has a field called collection_facet.
 
 There was a value 'Ness Motley Law Firm Documents' that we wanted to update 
 to 'Ness Motley Law Firm'.  There were 36,132 records with this value.  So I 
 re-indexed just the 36,132 records.  After the update, I ran a facet query 
 (q=*:*facet=truefacet.field=collection_facet) to see if the value got 
 updated and I saw
 Ness Motley Law Firm 36,132  -- as expected
 Ness Motley Law Firm Documents 0 — Why is this value still here even though 
 clearly there are no records with this value anymore?  I thought maybe it was 
 cached, so I restarted solr, but I still got the same results.
 
 facet_fields: { collection_facet: [
 … Ness Motley Law Firm, 36132,
 … Ness Motley Law Firm Documents, 0 ]
 
 
 
 Rebecca Tang
 Applications Developer, UCSF CKM
 Legacy Tobacco Document Librarylegacy.library.ucsf.edu/
 E: rebecca.t...@ucsf.edu


Re: Solr Dynamic Field Performance

2014-09-14 Thread Bill Bell
How about perf if you dynamically create 5000 fields ?

Bill Bell
Sent from mobile


 On Sep 14, 2014, at 10:06 AM, Erick Erickson erickerick...@gmail.com wrote:
 
 Dynamic fields, once they are actually _in_ a document, aren't any
 different than statically defined fields. Literally, there's no place
 in the search code that I know of that _ever_ has to check
 whether a field was dynamically or statically defined.
 
 AFAIK, the only additional cost would be figuring out which pattern
 matched at index time, which is such a tiny portion of the cost of
 indexing that I doubt you could measure it.
 
 Best,
 Erick
 
 On Sun, Sep 14, 2014 at 7:58 AM, Saumitra Srivastav
 saumitra.srivast...@gmail.com wrote:
 I have a collection with 200 fields and 300M docs running in cloud mode.
 Each doc have around 20 fields. I now have a use case where I need to
 replace these explicit fields with 6 dynamic fields. Each of these 200
 fields will match one of the 6 dynamic field.
 
 I am evaluating performance implications of switching to dynamicFields. I
 have tested with a smaller dataset(5M docs) but didn't noticed any indexing
 or query performance degradation.
 
 Query on dynamic fields will either be faceting, range query or full text
 search.
 
 Are there any known performance issues with using dynamicFields instead of
 explicit ones?
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Dynamic-Field-Performance-tp4158737.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to solve?

2014-09-06 Thread Bill Bell
Yeah we already use it. I will try to create a custom functionif I get it 
to work I will post.

The challenge for me is how to dynamically match and add them based in the 
faceting.

Here is a better example.

The doctor core has payload as name:val. The name are doctor specialties. I 
need to pull back by the name since the user faceted on a specialty. So far 
payloads work. But the user now wants to facet on another specialty. For 
example they are looking for a cardiologist and an internal medicine doctor and 
if the doctor practices at the same hospital I need to take the values and add 
them. Else take the max value for the 2 specialties. 

Make sense now ?

Seems like I need to create a payload and my own custom function.

Bill Bell
Sent from mobile


 On Sep 6, 2014, at 12:57 PM, Erick Erickson erickerick...@gmail.com wrote:
 
 Here's a blog with an end-to-end example. Jack's right, it takes some
 configuration and having first-class support in Solr would be a good
 thing...
 
 http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/
 
 Best,
 Erick
 
 On Sat, Sep 6, 2014 at 10:24 AM, Jack Krupansky j...@basetechnology.com 
 wrote:
 Payload really don't have first class support in Solr. It's a solid feature
 of Lucene, but never expressed well in Solr. Any thoughts or proposals are
 welcome!
 
 (Hmmm... I wonder what the good folks at Heliosearch have up their sleeves
 in this area?!)
 
 -- Jack Krupansky
 
 -Original Message- From: William Bell
 Sent: Friday, September 5, 2014 10:03 PM
 To: solr-user@lucene.apache.org
 Subject: How to solve?
 
 
 We have a core with each document as a person.
 
 We want to boost based on the sweater color, but if the person has sweaters
 in their closet which are the same manufactuer we want to boost even more
 by adding them together.
 
 Peter Smit - Sweater: Blue = 1 : Nike, Sweater: Red = 2: Nike, Sweater:
 Blue=1 : Polo
 Tony S - Sweater: Red =2: Nike
 Bill O - Sweater:Red = 2: Polo, Blue=1: Polo
 
 Scores:
 
 Peter Smit - 1+2 = 3.
 Tony S - 2
 Bill O - 2 + 1
 
 I thought about using payloads.
 
 sweaters_payload
 Blue: Nike: 1
 Red: Nike: 2
 Blue: Polo: 1
 
 How do I query this?
 
 http://localhost:8983/solr/persons?q=*:*sort=??
 
 Ideas?
 
 
 
 
 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076


Re: embedded documents

2014-08-24 Thread Bill Bell
See my Jira. It supports it via json.fsuffix=_jsonwt=json

http://mail-archives.apache.org/mod_mbox/lucene-dev/201304.mbox/%3CJIRA.12641293.1365394604231.125944.1365397875874@arcas%3E

Bill Bell
Sent from mobile


 On Aug 24, 2014, at 6:43 AM, Jack Krupansky j...@basetechnology.com wrote:
 
 Indexing and query of raw JSON would be a valuable addition to Solr, so maybe 
 you could simply explain more precisely your data model and transformation 
 rules. For example, when multi-level nesting occurs, what does your loader do?
 
 Maybe if the fielld names were derived by concatenating the full path of JSON 
 key names, like titles_json.FR, field_naming nesting could be handled in a 
 fully automated manner.
 
 I had been thinking of filing a Jira proposing exactly that, so that even the 
 most deeply nested JSON maps could be supported, although combinations of 
 arrays and maps would be problematic.
 
 -- Jack Krupansky
 
 -Original Message- From: Michael Pitsounis
 Sent: Wednesday, August 20, 2014 7:14 PM
 To: solr-user@lucene.apache.org
 Subject: embedded documents
 
 Hello everybody,
 
 I had a requirement to store complicated json documents in solr.
 
 i have modified the JsonLoader to accept complicated json documents with
 arrays/objects as values.
 
 It stores the object/array and then flatten it and  indexes the fields.
 
 e.g  basic example document
 
 {
   titles_json:{FR:This is the FR title , EN:This is the EN
 title} ,
   id: 103,
   guid: 3b2f2998-85ac-4a4e-8867-beb551c0b3c6
  }
 
 It will store titles_json:{FR:This is the FR title , EN:This is the
 EN title}
 and then index fields
 
 titles.FR:This is the FR title
 titles.EN:This is the EN title
 
 
 Do you see any problems with this approach?
 
 
 
 Regards,
 Michael Pitsounis 


Re: SolrCloud Scale Struggle

2014-08-02 Thread Bill Bell
Seems way overkill. Are you using /get at all ? If you need the docs avail 
right away - why ? How about after 30 seconds ? How many docs do you get added 
per second during peak ? Even Google has a delay when you do Adwords. 

One idea is yo have an empty core that you insert into and then shard into the 
queries. So one fire would be called newdocs and then you would add this core 
into your query. There are a couple issues with this with scoring but it works 
nicely. I would not even use Solrcloud for that core.

Try to reduce number of Java running. Reduce memory and use one java per 
machine. 

Then if you need faster avail if docs you really need to ask why. Why not 
later? If it got search or just showing the user the info ? If for showing 
maybe query a not indexes table for the few not yet indexed ?? Or just store in 
a db to show the user the info and index later?

Bill Bell
Sent from mobile


 On Aug 1, 2014, at 4:19 AM, anand.mahajan an...@zerebral.co.in wrote:
 
 Hello all,
 
 Struggling to get this going with SolrCloud - 
 
 Requirement in brief :
 - Ingest about 4M Used Cars listings a day and track all unique cars for
 changes
 - 4M automated searches a day (during the ingestion phase to check if a doc
 exists in the index (based on values of 4-5 key fields) or it is a new one
 or an updated version)
 - Of the 4 M - About 3M Updates to existing docs (for every non-key value
 change)
 - About 1M inserts a day (I'm assuming these many new listings come in
 every day)
 - Daily Bulk CSV exports of inserts / updates in last 24 hours of various
 snapshots of the data to various clients
 
 My current deployment : 
 i) I'm using Solr 4.8 and have set up a SolrCloud with 6 dedicated machines
 - 24 Core + 96 GB RAM each.
 ii)There are over 190M docs in the SolrCloud at the moment (for all
 replicas its consuming overall disk 2340GB which implies - each doc is at
 about 5-8kb in size.)
 iii) The docs are split into 36 Shards - and 3 replica per shard (in all
 108 Solr Jetty processes split over 6 Servers leaving about 18 Jetty JVMs
 running on each host)
 iv) There are 60 fields per doc and all fields are stored at the moment  :( 
 (The backend is only Solr at the moment)
 v) The current shard/routing key is a combination of Car Year, Make and
 some other car level attributes that help classify the cars
 vi) We are mostly using the default Solr config as of now - no heavy caching
 as the search is pretty random in nature 
 vii) Autocommit is on - with maxDocs = 1
 
 Current throughput  Issues :
 With the above mentioned deployment the daily throughout is only at about
 1.5M on average (Inserts + Updates) - falling way short of what is required.
 Search is slow - Some queries take about 15 seconds to return - and since
 insert is dependent on at least one Search that degrades the write
 throughput too. (This is not a Solr issue - but the app demands it so)
 
 Questions :
 
 1. Autocommit with maxDocs = 1 - is that a goof up and could that be slowing
 down indexing? Its a requirement that all docs are available as soon as
 indexed.
 
 2. Should I have been better served had I deployed a Single Jetty Solr
 instance per server with multiple cores running inside? The servers do start
 to swap out after a couple of days of Solr uptime - right now we reboot the
 entire cluster every 4 days.
 
 3. The routing key is not able to effectively balance the docs on available
 shards - There are a few shards with just about 2M docs - and others over
 11M docs. Shall I split the larger shards? But I do not have more nodes /
 hardware to allocate to this deployment. In such case would splitting up the
 large shards give better read-write throughput? 
 
 4. To remain with the current hardware - would it help if I remove 1 replica
 each from a shard? But that would mean even when just 1 node goes down for a
 shard there would be only 1 live node left that would not serve the write
 requests.
 
 5. Also, is there a way to control where the Split Shard replicas would go?
 Is there a pattern / rule that Solr follows when it creates replicas for
 split shards?
 
 6. I read somewhere that creating a Core would cost the OS one thread and a
 file handle. Since a core repsents an index in its entirty would it not be
 allocated the configured number of write threads? (The dafault that is 8)
 
 7. The Zookeeper cluster is deployed on the same boxes as the Solr instance
 - Would separating the ZK cluster out help?
 
 Sorry for the long thread _ I thought of asking these all at once rather
 than posting separate ones.
 
 Thanks,
 Anand
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Scale Struggle

2014-08-02 Thread Bill Bell
Auto correct not good

Corrected below 

Bill Bell
Sent from mobile


 On Aug 2, 2014, at 11:11 AM, Bill Bell billnb...@gmail.com wrote:
 
 Seems way overkill. Are you using /get at all ? If you need the docs avail 
 right away - why ? How about after 30 seconds ? How many docs do you get 
 added per second during peak ? Even Google has a delay when you do Adwords. 
 
 One idea is to have an empty core that you insert into and then shard into 
 the queries. So one core would be called newdocs and then you would add this 
 core into your query. There are a couple issues with this with scoring but it 
 works nicely. I would not even use Solrcloud for that core.
 
 Try to reduce number of Java instances running. Reduce memory and use one 
 java per machine. 
 
 Then if you need faster avail of docs you really need to ask why. Why not 
 later? Do you need search or just showing the user the info ? If for showing 
 maybe query a indexed table for the few not yet indexed ?? Or just store in a 
 db to show the user the info and index later?
 
 Bill Bell
 Sent from mobile
 
 
 On Aug 1, 2014, at 4:19 AM, anand.mahajan an...@zerebral.co.in wrote:
 
 Hello all,
 
 Struggling to get this going with SolrCloud - 
 
 Requirement in brief :
 - Ingest about 4M Used Cars listings a day and track all unique cars for
 changes
 - 4M automated searches a day (during the ingestion phase to check if a doc
 exists in the index (based on values of 4-5 key fields) or it is a new one
 or an updated version)
 - Of the 4 M - About 3M Updates to existing docs (for every non-key value
 change)
 - About 1M inserts a day (I'm assuming these many new listings come in
 every day)
 - Daily Bulk CSV exports of inserts / updates in last 24 hours of various
 snapshots of the data to various clients
 
 My current deployment : 
 i) I'm using Solr 4.8 and have set up a SolrCloud with 6 dedicated machines
 - 24 Core + 96 GB RAM each.
 ii)There are over 190M docs in the SolrCloud at the moment (for all
 replicas its consuming overall disk 2340GB which implies - each doc is at
 about 5-8kb in size.)
 iii) The docs are split into 36 Shards - and 3 replica per shard (in all
 108 Solr Jetty processes split over 6 Servers leaving about 18 Jetty JVMs
 running on each host)
 iv) There are 60 fields per doc and all fields are stored at the moment  :( 
 (The backend is only Solr at the moment)
 v) The current shard/routing key is a combination of Car Year, Make and
 some other car level attributes that help classify the cars
 vi) We are mostly using the default Solr config as of now - no heavy caching
 as the search is pretty random in nature 
 vii) Autocommit is on - with maxDocs = 1
 
 Current throughput  Issues :
 With the above mentioned deployment the daily throughout is only at about
 1.5M on average (Inserts + Updates) - falling way short of what is required.
 Search is slow - Some queries take about 15 seconds to return - and since
 insert is dependent on at least one Search that degrades the write
 throughput too. (This is not a Solr issue - but the app demands it so)
 
 Questions :
 
 1. Autocommit with maxDocs = 1 - is that a goof up and could that be slowing
 down indexing? Its a requirement that all docs are available as soon as
 indexed.
 
 2. Should I have been better served had I deployed a Single Jetty Solr
 instance per server with multiple cores running inside? The servers do start
 to swap out after a couple of days of Solr uptime - right now we reboot the
 entire cluster every 4 days.
 
 3. The routing key is not able to effectively balance the docs on available
 shards - There are a few shards with just about 2M docs - and others over
 11M docs. Shall I split the larger shards? But I do not have more nodes /
 hardware to allocate to this deployment. In such case would splitting up the
 large shards give better read-write throughput? 
 
 4. To remain with the current hardware - would it help if I remove 1 replica
 each from a shard? But that would mean even when just 1 node goes down for a
 shard there would be only 1 live node left that would not serve the write
 requests.
 
 5. Also, is there a way to control where the Split Shard replicas would go?
 Is there a pattern / rule that Solr follows when it creates replicas for
 split shards?
 
 6. I read somewhere that creating a Core would cost the OS one thread and a
 file handle. Since a core repsents an index in its entirty would it not be
 allocated the configured number of write threads? (The dafault that is 8)
 
 7. The Zookeeper cluster is deployed on the same boxes as the Solr instance
 - Would separating the ZK cluster out help?
 
 Sorry for the long thread _ I thought of asking these all at once rather
 than posting separate ones.
 
 Thanks,
 Anand
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Latest jetty

2014-07-26 Thread Bill Bell
Since we are now on latest Java JDK can we move to Jetty 9?

Thoughts ?

Bill Bell
Sent from mobile



Re: stucked with log4j configuration

2014-04-12 Thread Bill Bell
Well I hope log4j2 is something Solr supports when GA

Bill Bell
Sent from mobile


 On Apr 12, 2014, at 7:26 AM, Aman Tandon amantandon...@gmail.com wrote:
 
 I have upgraded my solr4.2 to solr 4.7.1 but in my logs there is an error
 for log4j
 
 log4j: Could not find resource
 
 Please find the attachment of the screenshot of the error console
 https://drive.google.com/file/d/0B5GzwVkR3aDzdjE1b2tXazdxcGs/edit?usp=sharing
 -- 
 With Regards
 Aman Tandon


Re: boost results within 250km

2014-04-09 Thread Bill Bell
Just take geodist and use the map function and send to bf or boost 

Bill Bell
Sent from mobile


 On Apr 9, 2014, at 8:26 AM, Erick Erickson erickerick...@gmail.com wrote:
 
 Why do you want to do this? This sounds like an XY problem, you're
 asking how to do something specific without explaining why you care,
 perhaps there are other ways to do this.
 
 Best,
 Erick
 
 On Tue, Apr 8, 2014 at 11:30 PM, Aman Tandon amantandon...@gmail.com wrote:
 How can i gave the more boost to the results within 250km than others
 without using result filtering.


Re: Luke 4.6.1 released

2014-02-16 Thread Bill Bell
Yes it works with Solr 

Bill Bell
Sent from mobile


 On Feb 16, 2014, at 3:38 PM, Alexandre Rafalovitch arafa...@gmail.com wrote:
 
 Does it work with Solr? I couldn't tell what the description was from
 this repo and it's Solr relevance.
 
 I am sure all the long timers know, but for more recent Solr people,
 the additional information would be useful.
 
 Regards,
   Alex.
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
 
 
 On Mon, Feb 17, 2014 at 3:02 AM, Dmitry Kan solrexp...@gmail.com wrote:
 Hello!
 
 Luke 4.6.1 has been just released. Grab it here:
 
 https://github.com/DmitryKey/luke/releases/tag/4.6.1
 
 fixes:
 loading the jar from command line is now working fine.
 
 --
 Dmitry Kan
 Blog: http://dmitrykan.blogspot.com
 Twitter: twitter.com/dmitrykan


Status if 4.6.1?

2014-01-18 Thread Bill Bell
We just need the bug fix for Solr.xml 

https://issues.apache.org/jira/browse/SOLR-5543

Bill Bell
Sent from mobile



Re: Call to Solr via TCP

2013-12-10 Thread Bill Bell
Yeah open socket to port and send correct Get syntax and Solr will respond with 
results...



Bill Bell
Sent from mobile


 On Dec 10, 2013, at 2:50 PM, Doug Turnbull 
 dturnb...@opensourceconnections.com wrote:
 
 Zwer, is there a reason you need to do this? Its probably very hard to
 get solr to speak TCP. But if you're having a performance or
 infrastructure problem, the group might be able to help you with a far
 simpler solution.
 
 Sent from my Windows Phone From: Zwer
 Sent: 12/10/2013 12:15 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Call to Solr via TCP
 Maybe I asked incorrectly.
 
 
 Solr is Web Application, hosted by some servlet container and is reachable
 via HTTP.
 
 HTTP is an extension of TCP and I would like to know whether exists some
 lower way to communicate with application (i.e. Solr) hosted by Jetty?
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Call-to-Solr-via-TCP-tp4105932p4105935.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to work with remote solr savely?

2013-11-22 Thread Bill Bell
Do you have a sample jetty XML to setup basic auth for updates in Solr?

Sent from my iPad

 On Nov 22, 2013, at 7:34 AM, michael.boom my_sky...@yahoo.com wrote:
 
 Use HTTP basic authentication, setup in your servlet container
 (jetty/tomcat).
 
 That should work fine if you are *not* using SolrCloud.
 
 
 
 -
 Thanks,
 Michael
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-work-with-remote-solr-savely-tp4102612p4102613.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: useColdSearcher in SolrCloud config

2013-11-22 Thread Bill Bell
Wouldn't that be true means use cold searcher? It seems backwards to me...

Sent from my iPad

 On Nov 22, 2013, at 2:44 AM, ade-b adrian.bro...@gmail.com wrote:
 
 Hi
 
 The definition of useColdSearcher config element in solrconfig.xml is
 
 If a search request comes in and there is no current registered searcher,
 then immediately register the still warming searcher and use it.  If false
 then all requests will block until the first searcher is done warming.
 
 By the term 'block', I assume SOLR returns a non 200 response to requests.
 Does anybody know the exact response code returned when the server is
 blocking requests?
 
 If a new SOLR server is introduced into an existing array of SOLR servers
 (in SOLR Cloud setup), it will sync it's index from the leader. To save you
 having to specify warm-up queries in the solrconfig.xml file for first
 searchers, would/could the new server not auto warm it's caches from the
 caches of an existing server?
 
 Thanks
 Ade 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/useColdSearcher-in-SolrCloud-config-tp4102569.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: NullPointerException

2013-11-22 Thread Bill Bell
It seems to be a modified row and referenced in EvaluatorBag.

I am not familiar with either.

Sent from my iPad

 On Nov 22, 2013, at 3:05 AM, Adrien RUFFIE a.ruf...@e-deal.com wrote:
 
 Hello all,
 
 I have perform a full indexation with solr, but when I try to perform an 
 incrementation indexation I get the following exception (cf attachment).
 
 Any one have a idea of the problem ?
 
 Greate thank
 log.txt


Re: Reverse mm(min-should-match)

2013-11-22 Thread Bill Bell
This is an awesome idea!

Sent from my iPad

 On Nov 22, 2013, at 12:54 PM, Doug Turnbull 
 dturnb...@opensourceconnections.com wrote:
 
 Instead of specifying a percentage or number of query terms must match
 tokens in a field, I'd like to do the opposite -- specify how much of a
 field must match a query.
 
 The problem I'm trying to solve is to boost document titles that closely
 match the query string. If a title looks something like
 
 *Title: *[solr] [the] [worlds] [greatest] [search] [engine]
 
 I want to be able to specify how much of the field must match the query
 string. This differs from normal mm. Normal mm specifies a how much of the
 query must match a field.
 
 As an example, with this title, if I use normal mm=100% and perform the
 following query:
 
 mm=100%
 q=solr
 
 This will match the title above, as 100% of [solr] matches the field
 
 What I really want to get at is a reverse mm:
 
 Rmm=100%
 q=solr
 
 The title above will not match in this case. Only 1/6 of the tokens in the
 field match the query.
 
 However an exact search would match:
 
 Rmm=100%
 q=solr the worlds greatest search engine
 
 Here 100% of the query matches the title, so I'm good.
 
 Is there any way to achieve this in Solr?
 
 -- 
 Doug Turnbull
 Search  Big Data Architect
 OpenSource Connections http://o19s.com


Re: Jetty 9?

2013-11-07 Thread Bill Bell
So no Jetty 9 until Solr 5? Java 7 is at rel 40 Is that our commitment to 
not require Java 7 until Solr 5? 

Most people are probably already on Java 7...

Bill Bell
Sent from mobile


 On Nov 7, 2013, at 1:29 AM, Furkan KAMACI furkankam...@gmail.com wrote:
 
 Here is an issue points to that:
 https://issues.apache.org/jira/browse/SOLR-4839
 
 
 2013/11/7 William Bell billnb...@gmail.com
 
 When are we moving Solr to Jetty 9?
 
 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076
 


Re: Performance of rows and start parameters

2013-11-04 Thread Bill Bell
Do you want to look thru then all ? Have you considered Lucene API? Not sure if 
that is better but it might be.

Bill Bell
Sent from mobile


 On Nov 4, 2013, at 6:43 AM, michael.boom my_sky...@yahoo.com wrote:
 
 I saw that some time ago there was a JIRA ticket dicussing this, but still i
 found no relevant information on how to deal with it.
 
 When working with big nr of docs (e.g. 70M) in my case, I'm using
 start=0rows=30 in my requests.
 For the first req the query time is ok, the next one is visibily slower, the
 third even more slow and so on until i get some huge query times of up
 140secs, after a few hundreds requests. My test were done with SolrMeter at
 a rate of 1000qpm. Same thing happens at 100qpm, tough.
 
 Is there a best practice on how to do in this situation, or maybe an
 explanation why is the query time increasing, from request to request ?
 
 Thanks!
 
 
 
 -
 Thanks,
 Michael
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Performance-of-rows-and-start-parameters-tp4099194.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Core admin: create new core

2013-11-04 Thread Bill Bell
You could pre create a bunch of directories and base configs. Create as needed. 
Then use schema less API to set it up ... Or make changes in a script and 
reload the core..

Bill Bell
Sent from mobile


 On Nov 4, 2013, at 6:06 AM, Erick Erickson erickerick...@gmail.com wrote:
 
 Right, this has been an issue for a while, there's no current
 way to do this.
 
 Someday, I'll be able to work on SOLR-4779 which should
 go some toward making this work more easily. It's still not
 exactly what you're looking for, but it might work.
 
 Of course with SolrCloud you can specify a configuration
 set that is used for multiple collections.
 
 People are using Puppet or similar to automate this over
 large numbers of nodes, but that's not entirely satisfactory
 either in our case I suspect.
 
 FWIW,
 Erick
 
 
 On Mon, Nov 4, 2013 at 4:00 AM, Bram Van Dam bram.van...@intix.eu wrote:
 
 The core admin CREATE function requires that the new instance dir and
 schema/config exist already. Is there a particular reason for this? It
 would be incredible convenient if I could create a core with a new schema
 and new config simply by calling CREATE (maybe providing the contents of
 config.xml and schema.xml as base64 encoded strings in HTTP POST or
 something?).
 
 I'm guessing this isn't currently possible?
 
 Ta,
 
 - bram
 


Re: Proposal for new feature, cold replicas, brainstorming

2013-10-27 Thread Bill Bell
Yeah replicate to a DR site would be good too. 

Bill Bell
Sent from mobile


 On Oct 24, 2013, at 6:27 AM, yriveiro yago.rive...@gmail.com wrote:
 
 I'm wondering some time ago if it's possible have replicas of a shard
 synchronized but in an state that they can't accept queries only updates. 
 
 This replica in replication mode only awake to accept queries if it's the
 last alive replica and goes to replication mode when other replica becomes
 alive and synchronized.
 
 The motivation of this is simple, I want have replication but I don't want
 have n replicas actives with full resources allocated (cache and so on).
 This is usefull in enviroments where replication is needed but a high query
 throughput is not fundamental and the resources are limited.
 
 I know that right now is not possible, but I think that it's a feature that
 can be implemented in a easy way creating a new status for shards.
 
 The bottom line question is, I'm the only one with this kind of
 requeriments? Does it make sense one functionality like this?
 
 
 
 -
 Best regards
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Proposal-for-new-feature-cold-replicas-brainstorming-tp4097501.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr - what's the next big thing?

2013-10-26 Thread Bill Bell
Full JSON support deep complex object indexing and search Game changer 

Bill Bell
Sent from mobile


 On Oct 26, 2013, at 1:04 PM, Otis Gospodnetic otis.gospodne...@gmail.com 
 wrote:
 
 Hi,
 
 On Sat, Oct 26, 2013 at 5:58 AM, Saar Carmi saarca...@gmail.com wrote:
 LOL,  Jack.  I can imagine Otis saying that.
 
 Funny indeed, but not really.
 
 Otis,  with these marriage,  are we going to see map reduce based queries?
 
 Can you please describe what you mean by that?  Maybe with an example.
 
 Thanks,
 Otis
 --
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/
 
 
 
 On Oct 25, 2013 10:03 PM, Jack Krupansky j...@basetechnology.com wrote:
 
 But a lot of that big yellow elephant stuff is in 4.x anyway.
 
 (Otis: I was afraid that you were going to say that the next big thing in
 Solr is... Elasticsearch!)
 
 -- Jack Krupansky
 
 -Original Message- From: Otis Gospodnetic
 Sent: Friday, October 25, 2013 2:43 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr - what's the next big thing?
 
 Saar,
 
 The marriage with the big yellow elephant is a big deal. It changes the
 scale.
 
 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Oct 25, 2013 5:32 AM, Saar Carmi saarca...@gmail.com wrote:
 
 If I am not mistaken the most impressive improvement of Solr 4.0 compared
 to previous versions was the Solr Cloud architecture.
 
 What would be the next big thing in Solr 5.0 ?
 
 Saar
 


Re: Spatial Distance Range

2013-10-22 Thread Bill Bell
Yes frange works 

Bill Bell
Sent from mobile


 On Oct 22, 2013, at 8:17 AM, Eric Grobler impalah...@googlemail.com wrote:
 
 Hi Everyone,
 
 Normally one would search for documents where the location is within a
 specified distance, for example widthin 5 km:
 fq={!geofilt pt=45.15,-93.85 sfield=store
 d=5}http://localhost:8983/solr/select?wt=jsonindent=truefl=name,storeq=*:*fq=%7B!geofilt%20pt=45.15,-93.85%20sfield=store%20d=5%7D
 
 It there a way to specify a range between 10 and 20 km?
 Something like:
 fq={!geofilt pt=45.15,-93.85 sfield=store distancefrom=10
 distanceupto=20}http://localhost:8983/solr/select?wt=jsonindent=truefl=name,storeq=*:*fq=%7B!geofilt%20pt=45.15,-93.85%20sfield=store%20d=5%7D
 
 Thanks
 Ericz


Re: Skipping caches on a /select

2013-10-17 Thread Bill Bell
But global on a qt would be awesome !!!

Bill Bell
Sent from mobile


 On Oct 17, 2013, at 2:43 PM, Yonik Seeley ysee...@gmail.com wrote:
 
 There isn't a global  cache=false... it's a local param that can be
 applied to any fq or q parameter independently.
 
 -Yonik
 
 
 On Thu, Oct 17, 2013 at 4:39 PM, Tim Vaillancourt t...@elementspace.com 
 wrote:
 Thanks Yonik,
 
 Does cache=false apply to all caches? The docs make it sound like it is
 for filterCache only, but I could be misunderstanding.
 
 When I force a commit and perform a /select a query many times with
 cache=false, I notice my query gets cached still, my guess is in the
 queryResultCache. At first the query takes 500ms+, then all subsequent
 requests take 0-1ms. I'll confirm this queryResultCache assumption today.
 
 Cheers,
 
 Tim
 
 
 On 16/10/13 06:33 PM, Yonik Seeley wrote:
 
 On Wed, Oct 16, 2013 at 6:18 PM, Tim Vaillancourtt...@elementspace.com
 wrote:
 
 I am debugging some /select queries on my Solr tier and would like to see
 if there is a way to tell Solr to skip the caches on a given /select
 query
 if it happens to ALREADY be in the cache. Live queries are being inserted
 and read from the caches, but I want my debug queries to bypass the cache
 entirely.
 
 I do know about the cache=false param (that causes the results of a
 select to not be INSERTED in to the cache), but what I am looking for
 instead is a way to tell Solr to not read the cache at all, even if there
 actually is a cached result for my query.
 
 Yeah, cache=false for q or fq should already not use the cache at
 all (read or write).
 
 -Yonik


DIH

2013-10-15 Thread Bill Bell
We have a custom Field processor in DIH and we are not CPU bound on one core... 
How do we thread it ?? We need to use more cores

The box has 32 cores and 1 is 100% CPU bound.

Ideas ?

Bill Bell
Sent from mobile



Re: DIH

2013-10-15 Thread Bill Bell
We are NOW CPU bound Thoughts ???

Bill Bell
Sent from mobile


 On Oct 15, 2013, at 8:49 PM, Bill Bell billnb...@gmail.com wrote:
 
 We have a custom Field processor in DIH and we are not CPU bound on one 
 core... How do we thread it ?? We need to use more cores
 
 The box has 32 cores and 1 is 100% CPU bound.
 
 Ideas ?
 
 Bill Bell
 Sent from mobile
 


Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-11 Thread Bill Bell
Does this work ?
I can suggest -XX:-UseLoopPredicate to switch off predicates.

???

Which version of 7 is recommended ?

Bill Bell
Sent from mobile


 On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org wrote:
 
 *Don't* use JDK 7u40, it's been known to cause index corruption and
 SIGSEGV faults with Lucene: LUCENE-5212   This has not been unnoticed by
 Oracle.
 
 ~ David
 
 On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote:
 
 2. Java version: There are huges performance winning between Java 5, 6
   and 7; we use Oracle JDK 7u40.
 


Re: Field with default value and stored=false, will be reset back to the default value in case of updating other fields

2013-10-09 Thread Bill Bell
You have to update the whole record including all fields...

Bill Bell
Sent from mobile


 On Oct 9, 2013, at 7:50 PM, deniz denizdurmu...@gmail.com wrote:
 
 hi all,
 
 I have encountered some problems and post it on stackoverflow here:
 http://stackoverflow.com/questions/19285251/solr-field-with-default-value-resets-itself-if-it-is-stored-false
  
 
 as you can see from the response, does it make sense to open a bug ticket
 for this? because, although i can workaround this by setting everything back
 to stored=true, it does not make sense to keep every field stored while i
 dont need to return them in the search result.. or will anyone can make more
 detailed explanations that this is expected and normal? 
 
 
 
 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Field-with-default-value-and-stored-false-will-be-reset-back-to-the-default-value-in-case-of-updatins-tp4094508.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.5 spatial search - distance and score

2013-09-13 Thread Bill Bell
You can apply his 4.5 patches to 4.4 or take trunk and it is there

Bill Bell
Sent from mobile


On Sep 12, 2013, at 6:23 PM, Weber solrmaill...@fluidolabs.com wrote:

 I'm trying to get score by using a custom boost and also get the distance. I
 found David's code* to get it using Intersects, which I want to replace by
 {!geofilt} or geodist()
 
 *David's code: https://issues.apache.org/jira/browse/SOLR-4255
 
 He told me geodist() will be available again for this kind of field, which
 is a geohash type.
 
 Then, I'd like to know how it can be done today on 4.4 with {!geofilt} and
 how it will be done on 4.5 using geodist()
 
 Thanks in advance.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-4-5-spatial-search-distance-and-score-tp4089706.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Some highlighted snippets aren't being returned

2013-09-08 Thread Bill Bell
Zip up all your configs 

Bill Bell
Sent from mobile


On Sep 8, 2013, at 3:00 PM, Eric O'Hanlon elo2...@columbia.edu wrote:

 Hi again Everyone,
 
 I didn't get any replies to this, so I thought I'd re-send in case anyone 
 missed it and has any thoughts.
 
 Thanks,
 Eric
 
 On Aug 7, 2013, at 1:51 PM, Eric O'Hanlon elo2...@columbia.edu wrote:
 
 Hi Everyone,
 
 I'm facing an issue in which my solr query is returning highlighted snippets 
 for some, but not all results.  For reference, I'm searching through an 
 index that contains web crawls of human-rights-related websites.  I'm 
 running solr as a webapp under Tomcat and I've included the query's solr 
 params from the Tomcat log:
 
 ...
 webapp=/solr-4.2
 path=/select
 params={facet=truesort=score+descgroup.limit=10spellcheck.q=Unanganf.mimetype_code.facet.limit=7hl.simple.pre=codeq.alt=*:*f.organization_type__facet.facet.limit=6f.language__facet.facet.limit=6hl=truef.date_of_capture_.facet.limit=6group.field=original_urlhl.simple.post=/codefacet.field=domainfacet.field=date_of_capture_facet.field=mimetype_codefacet.field=geographic_focus__facetfacet.field=organization_based_in__facetfacet.field=organization_type__facetfacet.field=language__facetfacet.field=creator_name__facethl.fragsize=600f.creator_name__facet.facet.limit=6facet.mincount=1qf=text^1hl.fl=contentshl.fl=titlehl.fl=original_urlwt=rubyf.geographic_focus__facet.facet.limit=6defType=edismaxrows=10f.domain.facet.limit=6q=Unanganf.organization_based_in__facet.facet.limit=6q.op=ANDgroup=truehl.usePhraseHighlighter=true}
  hits=8 status=0 QTime=108
 ...
 
 For the query above (which can be simplified to say: find all documents that 
 contain the word unangan and return facets, highlights, etc.), I get five 
 search results.  Only three of these are returning highlighted snippets.  
 Here's the highlighting portion of the solr response (note: printed in 
 ruby notation because I'm receiving this response in a Rails app):
 
 
 highlighting=
 {20100602195444/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf=
   {},
  
 20100902203939/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf=
   {},
  
 20111202233029/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf=
   {},
  20100618201646/http://www.komnasham.go.id/portal/files/39-99.pdf=
   {contents=
 [...actual snippet is returned here...]},
  20100902235358/http://www.komnasham.go.id/portal/files/39-99.pdf=
   {contents=
 [...actual snippet is returned here...]},
  
 20110302213056/http://www.komnasham.go.id/publikasi/doc_download/2-uu-no-39-tahun-1999=
   {contents=
 [...actual snippet is returned here...]},
  
 20110302213102/http://www.komnasham.go.id/publikasi/doc_view/2-uu-no-39-tahun-1999?tmpl=componentformat=raw=
   {contents=
 [...actual snippet is returned here...]},
  
 20120303113654/http://www.iwgia.org/iwgia_files_publications_files/0028_Utimut_heritage.pdf=
   {}}
 
 
 I have eight (as opposed to five) results above because I'm also doing a 
 grouped query, grouping by a field called original_url, and this leads to 
 five grouped results.
 
 I've confirmed that my highlight-lacking results DO contain the word 
 unangan, as expected, and this term is appearing in a text field that's 
 indexed and stored, and being searched for all text searches.  For example, 
 one of the search results is for a crawl of this document: 
 http://www.iwgia.org/iwgia_files_publications_files/0028_Utimut_heritage.pdf
 
 And if you view that document on the web, you'll see that it does contain 
 unangan.
 
 Has anyone seen this before?  And does anyone have any good suggestions for 
 troubleshooting/fixing the problem?
 
 Thanks!
 
 - Eric
 


Re: Concat 2 fields in another field

2013-08-27 Thread Bill Bell
If for search just copyField into a multivalued field

Or do it on indexing using DIH or code. A rhino script works too.

Bill Bell
Sent from mobile


On Aug 27, 2013, at 7:15 AM, Jack Krupansky j...@basetechnology.com wrote:

 I have additional examples in the two most recent early access releases of my 
 book - variations on using the existing update processors.
 
 -- Jack Krupansky
 
 -Original Message- From: Federico Chiacchiaretta
 Sent: Tuesday, August 27, 2013 8:39 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Concat 2 fields in another field
 
 Hi,
 we do the same thing using an update request processor chain, this is the
 snippet from solrconfig.xml
 
 updateRequestProcessorChain name=concatenation
 processor class=solr.CloneFieldUpdateProcessorFactory str name=source
 firstname/str str name=destconcatfield/str /processor processor
 class=solr.CloneFieldUpdateProcessorFactory str name=sourcelastname/
 str str name=destconcatfield/str /processor processor class=
 solr.ConcatFieldUpdateProcessorFactory str name=fieldNameconcatfield
 /str str name=delimiter_/str /processor
 processor class=solr.LogUpdateProcessorFactory / processor class=
 solr.RunUpdateProcessorFactory /
 /updateRequestProcessorChain
 
 
 Regards,
 Federico Chiacchiaretta
 
 
 
 2013/8/27 Markus Jelsma markus.jel...@openindex.io
 
 You may be more interested in the ConcatFieldUpdateProcessorFactory:
 
 http://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/ConcatFieldUpdateProcessorFactory.html
 
 
 
 -Original message-
  From:Alok Bhandari alokomprakashbhand...@gmail.com
  Sent: Tuesday 27th August 2013 14:05
  To: solr-user@lucene.apache.org
  Subject: Re: Concat 2 fields in another field
 
  Thanks for reply.
 
  But I don't want to introduce any scripting in my code so want to know  is
  there any Java component available for the same.
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Concat-2-fields-in-another-field-tp4086786p4086791.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 


Re: Solr 4.2.1 update to 4.3/4.4 problem

2013-08-27 Thread Bill Bell
Index and query

analyzer type=index

Bill Bell
Sent from mobile


On Aug 26, 2013, at 5:42 AM, skorrapa korrapati.sus...@gmail.com wrote:

 I have also re indexed the data and tried. And also tried with the belowl
  fieldType name=string_lower_case class=solr.TextField
 sortMissingLast=true omitNorms=true
  analyzer type = index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
analyzer type = query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
analyzer type = select
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType
 This didnt work as well...
 
 
 
 On Mon, Aug 26, 2013 at 4:03 PM, skorrapa [via Lucene] 
 ml-node+s472066n4086601...@n3.nabble.com wrote:
 
 Hello All,
 
 I am still facing the same issue. Case insensitive search isnot working on
 Solr 4.3
 I am using the below configurations in schema.xml
 fieldType name=string_lower_case class=solr.TextField
 sortMissingLast=true omitNorms=true
  analyzer type = index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
analyzer type = query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
analyzer type = select
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType
 Basically I want my string which could have spaces or characters like '-'
 or \ to be searched upon case insensitively.
 Please help.
 
 
 --
 If you reply to this email, your message will be added to the discussion
 below:
 
 http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086601.html
 To unsubscribe from Solr 4.2.1 update to 4.3/4.4 problem, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4081896code=a29ycmFwYXRpLnN1c2htYUBnbWFpbC5jb218NDA4MTg5Nnw0MjEwNTY0Mzc=
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086606.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Bill Bell
This seems like a fairly large issue. Can you create a Jira issue ?

Bill Bell
Sent from mobile


On Jul 30, 2013, at 12:34 PM, Dotan Cohen dotanco...@gmail.com wrote:

 On Tue, Jul 30, 2013 at 9:21 PM, Aloke Ghoshal alghos...@gmail.com wrote:
 Does adding facet.mincount=2 help?
 
 
 
 In fact, when adding facet.mincount=20 (I know that some dupes are in
 the hundreds) I got the OutOfMemoryError in seconds instead of
 minutes.
 
 -- 
 Dotan Cohen
 
 http://gibberish.co.il
 http://what-is-what.com


Re: Performance question on Spatial Search

2013-07-29 Thread Bill Bell
Can you compare with the old geo handler as a baseline. ?

Bill Bell
Sent from mobile


On Jul 29, 2013, at 4:25 PM, Erick Erickson erickerick...@gmail.com wrote:

 This is very strange. I'd expect slow queries on
 the first few queries while these caches were
 warmed, but after that I'd expect things to
 be quite fast.
 
 For a 12G index and 256G RAM, you have on the
 surface a LOT of hardware to throw at this problem.
 You can _try_ giving the JVM, say, 18G but that
 really shouldn't be a big issue, your index files
 should be MMaped.
 
 Let's try the crude thing first and give the JVM
 more memory.
 
 FWIW
 Erick
 
 On Mon, Jul 29, 2013 at 4:45 PM, Steven Bower smb-apa...@alcyon.net wrote:
 I've been doing some performance analysis of a spacial search use case I'm
 implementing in Solr 4.3.0. Basically I'm seeing search times alot higher
 than I'd like them to be and I'm hoping people may have some suggestions
 for how to optimize further.
 
 Here are the specs of what I'm doing now:
 
 Machine:
 - 16 cores @ 2.8ghz
 - 256gb RAM
 - 1TB (RAID 1+0 on 10 SSD)
 
 Content:
 - 45M docs (not very big only a few fields with no large textual content)
 - 1 geo field (using config below)
 - index is 12gb
 - 1 shard
 - Using MMapDirectory
 
 Field config:
 
 fieldType name=geo class=solr.SpatialRecursivePrefixTreeFieldType
 distErrPct=0.025 maxDistErr=0.00045
 spatialContextFactory=com.spatial4j.core.context.jts.JtsSpatialContextFactory
 units=degrees/
 
 field  name=geopoint indexed=true multiValued=false
 required=false stored=true type=geo/
 
 
 What I've figured out so far:
 
 - Most of my time (98%) is being spent in
 java.nio.Bits.copyToByteArray(long,Object,long,long) which is being
 driven by BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
 which from what I gather is basically reading terms from the .tim file
 in blocks
 
 - I moved from Java 1.6 to 1.7 based upon what I read here:
 http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance/
 and it definitely had some positive impact (i haven't been able to
 measure this independantly yet)
 
 - I changed maxDistErr from 0.09 (which is 1m precision per docs)
 to 0.00045 (50m precision) ..
 
 - It looks to me that the .tim file are being memory mapped fully (ie
 they show up in pmap output) the virtual size of the jvm is ~18gb
 (heap is 6gb)
 
 - I've optimized the index but this doesn't have a dramatic impact on
 performance
 
 Changing the precision and the JVM upgrade yielded a drop from ~18s
 avg query time to ~9s avg query time.. This is fantastic but I want to
 get this down into the 1-2 second range.
 
 At this point it seems that basically i am bottle-necked on basically
 copying memory out of the mapped .tim file which leads me to think
 that the only solution to my problem would be to read less data or
 somehow read it more efficiently..
 
 If anyone has any suggestions of where to go with this I'd love to know
 
 
 thanks,
 
 steve


Re: How to setup SimpleFSDirectoryFactory

2012-07-22 Thread Bill Bell
I get a similar situation using Windows 2008 and Solr 3.6. Memory using mmap is 
never released. Even if I turn off traffic and commit and do a manual gc. If 
the size of the index is 3gb then memory used will be heap + 3gb of shared 
used. If I use a 6gb index I get heap + 6gb. If I turn off MMapDirectoryFactory 
it goes back down. When is the MMap supposed to release memory ? It only does 
it on JVM restart now.

Bill Bell
Sent from mobile


On Jul 22, 2012, at 6:21 AM, geetha anjali anjaliprabh...@gmail.com wrote:

 It happens in 3.6, for this reasons I thought of moving to solandra.
 If I do a commit, the all documents are persisted with out any issues.
 There is no issues  in terms of any functionality, but only this happens is
 increase in physical RAM, goes higher and higher and stop at maximum and it
 never comes down.
 
 Thanks
 
 On Sun, Jul 22, 2012 at 3:38 AM, Lance Norskog goks...@gmail.com wrote:
 
 Interesting. Which version of Solr is this? What happens if you do a
 commit?
 
 On Sat, Jul 21, 2012 at 8:01 AM, geetha anjali anjaliprabh...@gmail.com
 wrote:
 Hi uwe,
 Great to know. We have files indexing 1/min. After 30 mins I see all
 my physical memory say its 100 percentage used(windows). On deep
 investigation found that mmap is not releasing os files handles. Do you
 find this behaviour?
 
 Thanks
 
 On 20 Jul 2012 14:04, Uwe Schindler u...@thetaphi.de wrote:
 
 Hi Bill,
 
 MMapDirectory uses the file system cache of your operating system, which
 has
 following consequences: In Linux, top  free should normally report only
 *few* free memory, because the O/S uses all memory not allocated by
 applications to cache disk I/O (and shows it as allocated, so having 0%
 free
 memory is just normal on Linux and also Windows). If you have other
 applications or Lucene/Solr itself that allocate lot's of heap space or
 malloc() a lot, then you are reducing free physical memory, so reducing
 fs
 cache. This depends also on your swappiness parameter (if swappiness is
 higher, inactive processes are swapped out easier, default is 60% on
 linux -
 freeing more space for FS cache - the backside is of course that maybe
 in-memory structures of Lucene and other applications get pages out).
 
 You will only see no paging at all if all memory allocated all
 applications
 + all mmapped files fit into memory. But paging in/out the mmapped Lucene
 index is much cheaper than using SimpleFSDirectory or
 NIOFSDirectory. If
 you use SimpleFS or NIO and your index is not in FS cache, it will also
 read
 it from physical disk again, so where is the difference. Paging is
 actually
 cheaper as no syscalls are involved.
 
 If you want as much as possible of your index in physical RAM, copy it to
 /dev/null regularily and buy more RUM :-)
 
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: uwe@thetaphi...
 
 From: Bill Bell [mailto:billnb...@gmail.com]
 Sent: Friday, July 20, 2012 5:17 AM
 Subject: Re: ...
 s=op using it? The least used memory will be removed from the OS
 automaticall=? Isee some paging. Wouldn't paging slow down the querying?
 
 
 My index is 10gb and every 8 hours we get most of it in shared memory.
 The
 m=mory is 99 percent used, and that does not leave any room for other
 apps. =
 
 Other implications?
 
 Sent from my mobile device
 720-256-8076
 
 On Jul 19, 2012, at 9:49 A...
 H=ap space or free system RAM:
 
 
 
 http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.htm
 l
 
 Uwe
 ...
 use i= since you might run out of memory on large indexes right?
 
 
 Here is how I got iSimpleFSDirectoryFactory to work. Just set -
 Dsolr.directoryFactor...
 set it=all up with a helper in solrconfig.xml...
 
 
 if (Constants.WINDOWS) {
 if (MMapDirectory.UNMAP_SUPPORTED  Constants.JRE_IS_64...
 
 
 
 --
 Lance Norskog
 goks...@gmail.com
 


Re: How to setup SimpleFSDirectoryFactory

2012-07-19 Thread Bill Bell
Thanks. Are you saying that if we run low on memory, the MMapDirectory will 
stop using it? The least used memory will be removed from the OS automatically? 
Isee some paging. Wouldn't paging slow down the querying?

My index is 10gb and every 8 hours we get most of it in shared memory. The 
memory is 99 percent used, and that does not leave any room for other apps. 

Other implications?

Sent from my mobile device
720-256-8076

On Jul 19, 2012, at 9:49 AM, Uwe Schindler u...@thetaphi.de wrote:

 Read this, then you will see that MMapDirectory will use 0% of your Java Heap 
 space or free system RAM:
 
 http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
 
 Uwe
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
 -Original Message-
 From: William Bell [mailto:billnb...@gmail.com]
 Sent: Tuesday, July 17, 2012 6:05 AM
 Subject: How to setup SimpleFSDirectoryFactory
 
 We all know that MMapDirectory is fastest. However we cannot always use it
 since you might run out of memory on large indexes right?
 
 Here is how I got iSimpleFSDirectoryFactory to work. Just set -
 Dsolr.directoryFactory=solr.SimpleFSDirectoryFactory.
 
 Your solrconfig.xml:
 
 directoryFactory name=DirectoryFactory
 class=${solr.directoryFactory:solr.StandardDirectoryFactory}/
 
 You can check it with http://localhost:8983/solr/admin/stats.jsp
 
 Notice that the default for Windows 64bit is MMapDirectory. Else
 NIOFSDirectory except for WIndows It would be nicer if we just set it 
 all up
 with a helper in solrconfig.xml...
 
 if (Constants.WINDOWS) {
 if (MMapDirectory.UNMAP_SUPPORTED  Constants.JRE_IS_64BIT)
return new MMapDirectory(path, lockFactory);
 else
return new SimpleFSDirectory(path, lockFactory);
 } else {
return new NIOFSDirectory(path, lockFactory);
  }
 }
 
 
 
 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076
 
 


Re: Mmap

2012-07-16 Thread Bill Bell
Any thought on this? Is the default Mmap?



Sent from my mobile device
720-256-8076

On Feb 14, 2012, at 7:16 AM, Bill Bell billnb...@gmail.com wrote:

 Does someone have an example of using unmap in 3.5 and chunksize?
 
 I am using Solr 3.5.
 
 I noticed in solrconfig.xml:
 
 directoryFactory name=DirectoryFactory 
 class=${solr.directoryFactory:solr.StandardDirectoryFactory}/
 
 I don't see this parameter taking.. When I set 
 -Dsolr.directoryFactory=solr.MMapDirectoryFactory
 
 How do I see the setting in the log or in stats.jsp ? I cannot find a place 
 that indicates it is set or not.
 
 I would assume StandardDirectoryFactory is being used but I do see (when I 
 set it or NOT set it)
 
 Bill Bell
 Sent from mobile
 


Re: Problem with sorting solr docs

2012-07-04 Thread Bill Bell
Would all optional fields need the sortmissinglast and sortmissingfirst set 
even when not sorting on that field? Seems broken to me.

Sent from my Mobile device
720-256-8076

On Jul 3, 2012, at 6:45 AM, Shubham Srivastava 
shubham.srivast...@makemytrip.com wrote:

 Just adding to the below-- If there is a field(say X) which is not populated 
 and in the query I am not sorting on this particular field but on another 
 field (say Y) still the result ordering would depend on X .
 
 Infact in the below problem mentioned by Harsh making X as 
 sortMissingLast=false sortMissingFirst=false solved the problem while in 
 the query he was sorting on Y.  This seems a bit illogical.
 
 Regards,
 Shubham
 
 From: Harshvardhan Ojha [harshvardhan.o...@makemytrip.com]
 Sent: Tuesday, July 03, 2012 5:58 PM
 To: solr-user@lucene.apache.org
 Subject: RE: Problem with sorting solr docs
 
 Hi,
 
 I have added field name=latlng indexed=true stored=true 
 sortMissingLast=false sortMissingFirst=false/ to my schema.xml, although 
 I am searching on name field.
 It seems to be working fine. What is its default behavior?
 
 Regards
 Harshvardhan Ojha
 
 -Original Message-
 From: Rafał Kuć [mailto:r@solr.pl]
 Sent: Tuesday, July 03, 2012 5:35 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Problem with sorting solr docs
 
 Hello!
 
 But the latlng field is not taken into account when sorting with sort defined 
 such as in your query. You only sort on the name field and only that field. 
 You can also define Solr behavior when there is no value in the field, but 
 adding sortMissingLast=true or sortMissingFirst=true to your type 
 definition in the schema.xml file.
 
 --
 Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
 
 Hi,
 
 Thanks for reply.
 I want to sort my docs on name field, it is working well only if I have all 
 fields populated well.
 But my latlng field is optional, every doc will not have this value.
 So those docs are not getting sorted.
 
 Regards
 Harshvardhan Ojha
 
 -Original Message-
 From: Rafał Kuć [mailto:r@solr.pl]
 Sent: Tuesday, July 03, 2012 5:24 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Problem with sorting solr docs
 
 Hello!
 
 Your query suggests that you are sorting on the 'name' field instead
 of the latlng field (sort=name +asc).
 
 The question is what you are trying to achieve ? Do you want to sort
 your documents from a given geographical point ? If that's the case
 you may want to look here:
 http://wiki.apache.org/solr/SpatialSearch/
 and look at the possibility of sorting on the distance from a given point.
 
 --
 Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
 ElasticSearch
 
 
 Hi,
 
 I have 260 docs which I want to sort on a single field latlng.
 doc
 str name=id1/str
 str name=nameAmphoe Khanom/str
 str name=latlng1.0,1.0/str
 /doc
 
 My query is :
 http://localhost:8080/solr/select?q=*:*sort=name +asc
 
 This query sorts all documents except those which doesn’t have latlng,
 and I can’t keep any default value for this field.
 My question is how can I sort all docs on latlng?
 
 Regards
 Harshvardhan Ojha  | Software Developer - Technology Development
|  MakeMyTrip.com, 243 SP Infocity, Udyog Vihar Phase 1, Gurgaon,
 Haryana - 122 016, India
 
 What's new?: Inspire - Discover an inspiring new way to plan and book travel 
 online.
 
 
 Office Map
 
 Facebook
 
 Twitter
 
 
 
 


Re: UI

2012-05-21 Thread Bill Bell
The php.net plugin is the best. SolrPHPClient is missing several features.

Sent from my Mobile device
720-256-8076

On May 21, 2012, at 6:35 AM, Tolga to...@ozses.net wrote:

 Hi,
 
 Can you recommend a good PHP UI to search? Is SolrPHPClient good?


Re: slave index not cleaned

2012-05-14 Thread Bill Bell
This is a known issue in 1.4 especially in Windows. Some of it was resolved in 
3x.

Bill Bell
Sent from mobile


On May 14, 2012, at 5:54 AM, Erick Erickson erickerick...@gmail.com wrote:

 Hmmm, replication will require up to twice the space of the
 index _temporarily_, just checking if that's what you're seeing
 But that should go away reasonably soon. Out of curiosity, what
 happens if you restart your server, do the extra files go away?
 
 But it sounds like your index is growing over a longer period of time
 than just a single replication, is that true?
 
 Best
 Erick
 
 On Fri, May 11, 2012 at 6:03 AM, Jasper Floor jasper.fl...@m4n.nl wrote:
 Hi,
 
 On Thu, May 10, 2012 at 5:59 PM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:
 Hi Jasper,
 
 Sorry, I should've added more technical info wihtout being prompted.
 
 Solr does handle that for you.  Some more stuff to share:
 
 * Solr version?
 
 1.4
 
 * JVM version?
 1.7 update 2
 
 * OS?
 Debian (2.6.32-5-xen-amd64)
 
 * Java replication?
 yes
 
 * Errors in Solr logs?
 no
 
 * deletion policy section in solrconfig.xml?
 missing I would say, but I don't see this on the replication wiki page.
 
 This is what we have configured for replication:
 
 requestHandler name=/replication class=solr.ReplicationHandler 
lst name=slave
 
str 
 name=masterUrl${solr.master.url}/df-stream-store/replication/str
 
str name=pollInterval00:20:00/str
str name=compressioninternal/str
str name=httpConnTimeout5000/str
str name=httpReadTimeout1/str
 
 /lst
 /requestHandler
 
 We will be updating to 3.6 fairly soon however. To be honest, from
 what I've read, the Solr cloud is what we really want in the future
 but we will have to be patient for that.
 
 thanks in advance
 
 mvg,
 Jasper
 
 You may also want to look at your Index report in SPM 
 (http://sematext.com/spm) before/during/after replication and share what 
 you see.
 
 Otis
 
 Performance Monitoring for Solr / ElasticSearch / HBase - 
 http://sematext.com/spm
 
 
 
 - Original Message -
 From: Jasper Floor jasper.fl...@m4n.nl
 To: solr-user@lucene.apache.org
 Cc:
 Sent: Thursday, May 10, 2012 9:08 AM
 Subject: slave index not cleaned
 
 Perhaps I am missing the obvious but our slaves tend to run out of
 disk space. The index sizes grow to multiple times the size of the
 master. So I just toss all the data and trigger a replication.
 However, can't solr handle this for me?
 
 I'm sorry if I've missed a simple setting which does this for me, but
 if its there then I have missed it.
 
 mvg
 Jasper
 


Re: Replication. confFiles and permissions.

2012-05-09 Thread Bill Bell
Why would you replicate data import properties? The master does the importing 
not the slave...

Sent from my Mobile device
720-256-8076

On May 9, 2012, at 7:23 AM, stockii stock.jo...@googlemail.com wrote:

 Hello.
 
 
 i running a solr replication. works well, but i need to replicate my
 dataimport-properties. 
 
 if server1 replicate this file after he create everytime a new file, with
 *.timestamp, because the first replication run create this file with wrong
 permissions ...
 
 how can is say to solr replication chmod 755  dataimport-properties ...  ?
 ;-)
 
 thx
 
 -
 --- System 
 
 
 One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 
 1 Core with 45 Million Documents other Cores  200.000
 
 - Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
 - Solr2 for Update-Request  - delta every Minute - 4GB Xmx
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Replication-confFiles-and-permissions-tp3973825.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is it possible to limit the bandwidth of replication

2012-05-09 Thread Bill Bell
+1 as well especially for larger indexes

Sent from my Mobile device
720-256-8076

On May 9, 2012, at 9:46 AM, Jan Høydahl jan@cominvent.com wrote:

 I think we have to add this for java based rep. 
 +1
 


Re: Solritas in production

2012-05-08 Thread Bill Bell
I would not use Solaritas unless for very rudimentary solutions and prototypes.

Sent from my Mobile device
720-256-8076

On May 6, 2012, at 6:02 AM, András Bártházi and...@barthazi.hu wrote:

 Hi,
 
 We're currently evaluating Solr as a Sphinx replacement. Our site has
 1.000.000+ pageviews a day, it's a real estate search engine. The
 development is almost done, and it seems to be working fine, however some
 of my colleagues come with the idea that we're using it wrong. We're using
 it as a service from PHP/Symfony.
 
 They think we should use Solritas as a frontend, so site visitors will
 directly use it, so no PHP will be involved, so it will be use much less
 infrastructure. One of them said that even mobile.de using it that way (I
 have found no clue about it at all).
 
 Do you think is it a good idea?
 
 Do you know services using Solritas as a frontend on a public site?
 
 My personal opinion is that using Solritas in production is a very bad idea
 for us, but have not so much experience with Solr yet, and Solritas
 documentation is far from a detailed, up-to-date one, so don't really know
 what is it really usable for.
 
 Thanks,
  Andras


Re: change index/store at indexing time

2012-04-27 Thread Bill Bell
Yes you can. Just use a script that is called for each row.

Bill Bell
Sent from mobile


On Apr 27, 2012, at 6:38 PM, Vazquez, Maria (STM) maria.vazq...@dexone.com 
wrote:

 Hi,
 I'm migrating a project from Lucene 2.9 to Solr 3.4.
 There is a special case in the code that indexes the same field in two 
 different ways, which is completely legal in Lucene directly but I don't know 
 how to duplicate this same behavior in Solr:
 
 if (isFirstGeo) {
 document.add(new Field(geoids, geoId, Field.Store.YES, 
 Field.Index.NOT_ANALYZED_NO_NORMS));
 isFirstGeo = false;
 } else {
 if (countProducts  100)
  document.add(new Field(geoids, geoId, Field.Store.NO, 
 Field.Index.NOT_ANALYZED_NO_NORMS));
 else
  document.add(new Field(geoids, geoId, Field.Store.YES, 
 Field.Index.NO));
 }
 
 Is there any way to do this in Solr in a Tranformer? I'm using the DIH to 
 index and I can't see a way to do this other than having three fields in the 
 schema like geoids_store_index, geoids_nostore_index, and 
 geoids_store_noindex.
 
 Thanks a lot in advance.
 Maria
 
 
 


Re: commit stops

2012-04-27 Thread Bill Bell
We also see extreme slowness using Solr 3.6 when trying to commit a delete. We 
also get hangs. We do 1 commit at most a week. Rebuilding from scratching using 
DIH works fine and has never hung.

Bill Bell
Sent from mobile


On Apr 27, 2012, at 5:59 PM, mav.p...@holidaylettings.co.uk 
mav.p...@holidaylettings.co.uk wrote:

 Thanks for the reply
 
 The client expects a response within 2 minutes and after that will report
 an error. When we build fresh it seems to work and the operation takes a
 second or two to complete. Once it gets to a stage it hangs it simply
 won't accept any further commits. I did an index check and all was ok.
 
 I don¹t see any major commit happening at any  time, it seems to just
 hang. Even starting up and shutting down takes ages.
 
 We make 3 - 4 commits a day.
 
 We use solr 3.5
 
 No autocommit
 
 
 
 On 28/04/2012 00:56, Yonik Seeley yo...@lucidimagination.com wrote:
 
 On Fri, Apr 27, 2012 at 9:18 AM, mav.p...@holidaylettings.co.uk
 mav.p...@holidaylettings.co.uk wrote:
 We have an index of about 3.5gb which seems to work fine until it
 suddenly stops accepting new commits.
 
 Users can still search on the front end but nothing new can be
 committed and it always times out on commit.
 
 Any ideas?
 
 Perhaps the commit happens to cause a major merge which may take a
 long time (and solr isn't going to allow overlapping commits).
 How long does a commit request take to time out?
 
 What Solr version is this?  Do you have any kind of auto-commit set
 up?  How often are you manually committing?
 
 -Yonik
 lucenerevolution.com - Lucene/Solr Open Source Search Conference.
 Boston May 7-10
 


Re: Does Solr fit my needs?

2012-04-27 Thread Bill Bell
You could use SQL Server and External Fields in Solr to get what you need from 
the database on result of the query.

Bill Bell
Sent from mobile


On Apr 27, 2012, at 8:31 AM, G.Long jde...@gmail.com wrote:

 Hi there :)
 
 I'm looking for a way to save xml files into some sort of database and i'm 
 wondering if Solr would fit my needs.
 The xml files I want to save have a lot of child nodes which also contain 
 child nodes with multiple values. The depth level can be more than 10.
 
 After having indexed the files, I would like to be able to query for subparts 
 of those xml files and be able to reconstruct them as xml files with all 
 their children included. However, I'm wondering if it is possible with an 
 index like solr lucene to keep or easily recover the structure of my xml data?
 
 Thanks for your help,
 
 Regards,
 
 Gary


Question concerning date fields

2012-04-20 Thread Bill Bell
We are loading a long (number of seconds since 1970?) value into Solr using 
java and Solrj. What is the best way to convert this into the right Solr date 
fields?

Sent from my Mobile device
720-256-8076


Re: ExtractingRequestHandler

2012-04-01 Thread Bill Bell
I have had good luck with creating a separate core index for just data. This is 
a different core than the indexed core.

Very fast.

Bill Bell
Sent from mobile


On Apr 1, 2012, at 11:15 AM, Erick Erickson erickerick...@gmail.com wrote:

 Yes, you can. but Generally, storing the raw input in Solr is
 not the best approach. The problem here is that pretty soon
 you get a huge index that contains *everything*. Solr was not
 intended to be a data store.
 
 Besides, you then need to store the binary form of the file. Solr
 only deals with text, not markup.
 
 Most people index the text in Solr, and enough information
 so the application knows where to go to fetch the original
 document when the user drills down (e.g. file path, database
 PK, etc). Would that work for your situation?
 
 Best
 Erick
 
 On Sat, Mar 31, 2012 at 3:55 PM,  spr...@gmx.eu wrote:
 Hi,
 
 I want to index various filetypes in solr, this can easily done with
 ExtractingRequestHandler. But I also need the extracted content back.
 I know ext.extract.only but then nothing gets indexed, right?
 
 Can I index the document AND get the content back as with ext.extract.only?
 In a single request?
 
 Thank you
 
 


Re: Empty facet counts

2012-03-29 Thread Bill Bell
Send schema.xml and did you apply any patches? What version of Solr?

Bill Bell
Sent from mobile


On Mar 29, 2012, at 5:26 AM, Youri Westerman yo...@pluxcustoms.nl wrote:

 Hi,
 
 I'm currently learning how to use solr and everything seems pretty straight
 forward. For some reason when I use faceted queries it returns only empty
 sets in the facet_count section.
 
 The get params I'm using are:
  ?q=*:*rows=0facet=truefacet.field=urn
 
 The result:
  facet_counts: {
 
  facet_queries: { },
  facet_fields: { },
  facet_dates: { },
  facet_ranges: { }
 
  }
 
 The urn field is indexed and there are enough entries to be counted. When
 adding facet.method=Enum, nothing changes.
 Does anyone know why this is happening? Am I missing something?
 
 Thanks in advance!
 
 Youri


Re: DataImportHandler: backups prior to full-import

2012-03-28 Thread Bill Bell
You could use the Solr Command Utility SCU that runs from Windows and can be 
scheduled to run. 

https://github.com/justengland/Solr-Command-Utility

This is a windows system that will index using a core, and swap it if it 
succeeds. It works it's Solr.

Let me know if you have any questions.

On Mar 28, 2012, at 10:11 PM, Shawn Heisey s...@elyograg.org wrote:

 On 3/28/2012 12:46 PM, Artem Shnayder wrote:
 Does anyone know of any work done to automatically run a backup prior to a
 DataImportHandler full-import?
 
 I've asked this question on #solr and was pointed to
 https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API
 which
 is helpful but is not an automatic backup in the context of full-import's.
 I'm wondering if anyone else has done this work yet.
 
 I have located a previous message from you where you mention that you are on 
 Ubuntu.  If that's true, you can use hard links to make nearly instantaneous 
 backups with a single command:
 
 ln /path/to/index/* /path/to/backup/.
 
 One caveat to that - the backup must be on the same filesystem as the index.  
 If keeping backups on another filesystem (or even another computer) is 
 important, then treat the hard link backup as a temporary directory.  Copy 
 the files from that directory to your remote location, then delete them.
 
 This works because of the way that Lucene (and by extension Solr) manages 
 files on disk - existing segment files are never modified.  If they get 
 merged, new files are created before the old ones are deleted.  There is only 
 one file in an index directory that does change without getting a new name - 
 segments.gen.  I have verified (on Solr 3.5) that even this file is properly 
 handled so that a hard link backup keeps the correct version.
 
 For people running on Windows, this particular method won't work.  Newer 
 Windows server versions do have one feature that might actually make it 
 possible to do something similar - shadow copies.  I do not know how to 
 leverage the feature, though.
 
 Thanks,
 Shawn
 


Re: Performance Question

2012-03-19 Thread Bill Bell
The size of the index does matter practically speaking.

Bill Bell
Sent from mobile


On Mar 19, 2012, at 11:41 AM, Mikhail Khludnev mkhlud...@griddynamics.com 
wrote:

 Exactly. That's what I mean.
 
 On Mon, Mar 19, 2012 at 6:15 PM, Jamie Johnson jej2...@gmail.com wrote:
 
 Mikhail,
 
 Thanks for the response.  Just to be clear you're saying that the size
 of the index does not matter, it's more the size of the results?
 
 On Fri, Mar 16, 2012 at 2:43 PM, Mikhail Khludnev
 mkhlud...@griddynamics.com wrote:
 Hello,
 
 Frankly speaking the computational complexity of Lucene search depends
 from
 size of search result: numFound*log(start+rows), but from size of index.
 
 Regards
 
 On Fri, Mar 16, 2012 at 9:34 PM, Jamie Johnson jej2...@gmail.com
 wrote:
 
 I'm curious if anyone tell me how Solr/Lucene performs in a situation
 where you have 100,000 documents each with 100 tokens vs having
 1,000,000 documents each with 10 tokens.  Should I expect the
 performance to be the same?  Any information would be greatly
 appreciated.
 
 
 
 
 --
 Sincerely yours
 Mikhail Khludnev
 Lucid Certified
 Apache Lucene/Solr Developer
 Grid Dynamics
 
 http://www.griddynamics.com
 mkhlud...@griddynamics.com
 
 
 
 
 -- 
 Sincerely yours
 Mikhail Khludnev
 Lucid Certified
 Apache Lucene/Solr Developer
 Grid Dynamics
 
 http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Solr core swap after rebuild in HA-setup / High-traffic

2012-03-17 Thread Bill Bell
DIH sets the time of update to the start time not the end time,

So when the index is rebuilt, if you run an delta and use the update time you 
should be okay. We normally go back a few minutes to make sure we have all s a 
fail safe as well.

Sent from my Mobile device
720-256-8076

On Mar 14, 2012, at 12:58 PM, KeesSchepers k...@keesschepers.nl wrote:

 Hello everybody,
 
 I am designing a new Solr architecture for one of my clients. This sorl
 architecture is for a high-traffic website with million of visitors but I am
 facing some design problems were I hope you guys could help me out.
 
 In my situation there are 4 Solr servers running, 1 server is master and 3
 are slave. They are running Solr version 1.4.
 
 I use two cores 'live' and 'rebuild' and I use Solr DIH to rebuild a core
 which goes like this:
 
 1. I wipe the reindex core
 2. I run the DIH to the complete dataset (4 million documents) in peices of
 20.000 records (to prevent very long mysql locks)
 3. After the DIH is finished (2 hours) we have to also have to update the
 rebuild core with changes from the last two hours, this is a problem
 4. After updating is done and the core is not more then some seconds behind
 we want to SWAP the cores.
 
 Everything goes well except for step 3. The rebuild and the core swap is all
 okay. 
 
 Because the website is undergoing changes every minute we cannot pauze the
 delta-import on the live and walk behind for 2 hours. The problem is that I
 can't figure out a closing system with not delaying the live core to long
 and use the DIH instead of writing a lot of code.
 
 Did anyone face this problem before or could give me some tips?
 
 Thanks!
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-core-swap-after-rebuild-in-HA-setup-High-traffic-tp3826461p3826461.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dynamically Load Query Time Synonym File

2012-02-26 Thread Bill Bell
It would depend.

If the synonyms are used on indexing, you need to reindex. Otherwise, you
could reload and use the synonyms on query.

On 2/26/12 4:05 AM, Ahmet Arslan iori...@yahoo.com wrote:


 Is there a way to dynamically load a synonym file without
 restarting solr core ?

There is an open jira for this :
https://issues.apache.org/jira/browse/SOLR-1307





Debugging on 3,5

2012-02-14 Thread Bill Bell

I did find a solution, but the output is horrible. Why does explain look so 
badly?

lst name=explainstr name=2H7DF
6.351252 = (MATCH) boost(*:*,query(specialties_ids: #1;#0;#0;#0;#0;#0;#0;#0;#0; 
,def=0.0)), product of:
  1.0 = (MATCH) MatchAllDocsQuery, product of:
1.0 = queryNorm
  6.351252 = query(specialties_ids: #1;#0;#0;#0;#0;#0;#0;#0;#0; 
,def=0.0)=6.351252
/str


defType=edismaxboost=query($param)param=multi_field:87
--


We like the boost parameter in SOLR 3.5 with eDismax.

The question we have is what we would like to replace bq with boost, but we get 
the multi-valued field issue when we try to do this.

Bill Bell
Sent from mobile



Mmap

2012-02-14 Thread Bill Bell
Does someone have an example of using unmap in 3.5 and chunksize?

 I am using Solr 3.5.

I noticed in solrconfig.xml:

directoryFactory name=DirectoryFactory 
class=${solr.directoryFactory:solr.StandardDirectoryFactory}/

I don't see this parameter taking.. When I set 
-Dsolr.directoryFactory=solr.MMapDirectoryFactory

How do I see the setting in the log or in stats.jsp ? I cannot find a place 
that indicates it is set or not.

I would assume StandardDirectoryFactory is being used but I do see (when I set 
it or NOT set it)

Bill Bell
Sent from mobile



Re: Improving performance for SOLR geo queries?

2012-02-14 Thread Bill Bell
Can we get this back ported to 3x?

Bill Bell
Sent from mobile


On Feb 14, 2012, at 3:45 AM, Matthias Käppler matth...@qype.com wrote:

 hey thanks all for the suggestions, didn't have time to look into them
 yet as we're feature-sprinting for MWC, but will report back with some
 feedback over the next weeks (we will have a few more performance
 sprints in March)
 
 Best,
 Matthias
 
 On Mon, Feb 13, 2012 at 2:32 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Thu, Feb 9, 2012 at 1:46 PM, Yonik Seeley yo...@lucidimagination.com 
 wrote:
 One way to speed up numeric range queries (at the cost of increased
 index size) is to lower the precisionStep.  You could try changing
 this from 8 to 4 and then re-indexing to see how that affects your
 query speed.
 
 Your issue, and the fact that I had been looking at the post-filtering
 code again for another client, reminded me that I had been planning on
 implementing post-filtering for spatial.  It's now checked into trunk.
 
 If you have the ability to use trunk, you can add a high cost (like
 cost=200) along with cache=false to trigger it.
 
 More details here:
 http://www.lucidimagination.com/blog/2012/02/10/advanced-filter-caching-in-solr/
 
 -Yonik
 lucidimagination.com
 
 
 
 -- 
 Matthias Käppler
 Lead Developer API  Mobile
 
 Qype GmbH
 Großer Burstah 50-52
 20457 Hamburg
 Telephone: +49 (0)40 - 219 019 2 - 160
 Skype: m_kaeppler
 Email: matth...@qype.com
 
 Managing Director: Ian Brotherston
 Amtsgericht Hamburg
 HRB 95913
 
 This e-mail and its attachments may contain confidential and/or
 privileged information. If you are not the intended recipient (or have
 received this e-mail in error) please notify the sender immediately
 and destroy this e-mail and its attachments. Any unauthorized copying,
 disclosure or distribution of this e-mail and  its attachments is
 strictly forbidden. This notice also applies to future messages.


Help with MMapDirectoryFactory in 3.5

2012-02-11 Thread Bill Bell
 I am using Solr 3.5.

I noticed in solrconfig.xml:

directoryFactory name=DirectoryFactory
class=${solr.directoryFactory:solr.StandardDirectoryFactory}/

I don't see this parameter taking.. When I set
-Dsolr.directoryFactory=solr.MMapDirectoryFactory

How do I see the setting in the log or in stats.jsp ? I cannot find a place
that indicates it is set or not.

I would assume StandardDirectoryFactory is being used but I do see (when I
set it or NOT set it)

ame:  searcher  class:  org.apache.solr.search.SolrIndexSearcher  version:
1.0  description:  index searcher  stats: searcherName :  Searcher@71fc3828
main 
caching :  true 
numDocs :  2121163 
maxDoc :  2121163 
reader :  
SolrIndexReader{this=1867ec28,r=ReadOnlyDirectoryReader@1867ec28,refCnt=1,se
gments=1} 
readerDir :  
org.apache.lucene.store.MMapDirectory@C:\solr\jetty\example\solr\providersea
rch\data\index 
lockFactory=org.apache.lucene.store.NativeFSLockFactory@45c1cfc1
indexVersion :  1324594650551
openedAt :  Sat Feb 11 09:49:31 MST 2012
registeredAt :  Sat Feb 11 09:49:31 MST 2012
warmupTime :  0

Also, how do I set unman and what is the purpose of chunk size?




Re: Help with MMapDirectoryFactory in 3.5

2012-02-11 Thread Bill Bell
Also, does someone have an example of using unmap in 3.5 and chunksize?

From:  Bill Bell billnb...@gmail.com
Date:  Sat, 11 Feb 2012 10:39:56 -0700
To:  solr-user@lucene.apache.org
Subject:  Help with MMapDirectoryFactory in 3.5

 I am using Solr 3.5.

I noticed in solrconfig.xml:

directoryFactory name=DirectoryFactory
class=${solr.directoryFactory:solr.StandardDirectoryFactory}/

I don't see this parameter taking.. When I set
-Dsolr.directoryFactory=solr.MMapDirectoryFactory

How do I see the setting in the log or in stats.jsp ? I cannot find a place
that indicates it is set or not.

I would assume StandardDirectoryFactory is being used but I do see (when I
set it or NOT set it)

ame:  searcher  class:  org.apache.solr.search.SolrIndexSearcher  version:
1.0  description:  index searcher  stats: searcherName : Searcher@71fc3828
main 
caching : true 
numDocs : 2121163 
maxDoc : 2121163 
reader : 
SolrIndexReader{this=1867ec28,r=ReadOnlyDirectoryReader@1867ec28,refCnt=1,se
gments=1} 
readerDir : 
org.apache.lucene.store.MMapDirectory@C:\solr\jetty\example\solr\providersea
rch\data\index 
lockFactory=org.apache.lucene.store.NativeFSLockFactory@45c1cfc1
indexVersion : 1324594650551
openedAt : Sat Feb 11 09:49:31 MST 2012
registeredAt : Sat Feb 11 09:49:31 MST 2012
warmupTime : 0 

Also, how do I set unman and what is the purpose of chunk size?




boost question. need boost to take a query like bq

2012-02-11 Thread Bill Bell


We like the boost parameter in SOLR 3.5 with eDismax.

The question we have is what we would like to replace bq with boost, but we
get the multi-valued field issue when we try to do the equivalent queriesŠ
HTTP ERROR 400
Problem accessing /solr/providersearch/select. Reason:
can not use FieldCache on multivalued field: specialties_ids


q=*:*bq=multi_field:87^2defType=dismax

How do you do this using boost?

q=*:*boost=multi_field:87defType=edismax

We know we can use bq with edismax, but we like the multiply feature of
boost.

If I change it to a single valued field I get results, but they are all 1.0.

str name=YFFL5
1.0 = (MATCH) MatchAllDocsQuery, product of:
  1.0 = queryNorm
/str

q=*:*boost=single_field:87defType=edismax  // this works, but I need it on
multivalued






FW: boost question. need boost to take a query like bq

2012-02-11 Thread Bill Bell


I did find a solution, but the output is horrible. Why does explain look so
badly?

lst name=explainstr name=2H7DF
6.351252 = (MATCH) boost(*:*,query(specialties_ids:
#1;#0;#0;#0;#0;#0;#0;#0;#0; ,def=0.0)), product of:
  1.0 = (MATCH) MatchAllDocsQuery, product of:
1.0 = queryNorm
  6.351252 = query(specialties_ids: #1;#0;#0;#0;#0;#0;#0;#0;#0;
,def=0.0)=6.351252
/str


defType=edismaxboost=query($param)param=multi_field:87
--


We like the boost parameter in SOLR 3.5 with eDismax.

The question we have is what we would like to replace bq with boost, but we
get the multi-valued field issue when we try to do the equivalent queriesŠ
HTTP ERROR 400
Problem accessing /solr/providersearch/select. Reason:
can not use FieldCache on multivalued field: specialties_ids


q=*:*bq=multi_field:87^2defType=dismax

How do you do this using boost?

q=*:*boost=multi_field:87defType=edismax

We know we can use bq with edismax, but we like the multiply feature of
boost.

If I change it to a single valued field I get results, but they are all 1.0.

str name=YFFL5
1.0 = (MATCH) MatchAllDocsQuery, product of:
  1.0 = queryNorm
/str

q=*:*boost=single_field:87defType=edismax  // this works, but I need it on
multivalued






Re: Performance issue: Frange with geodist()

2011-10-15 Thread Bill Bell
I added a Jira issue for this:

https://issues.apache.org/jira/browse/SOLR-2840



On 10/13/11 8:15 AM, Yonik Seeley yo...@lucidimagination.com wrote:

On Thu, Oct 13, 2011 at 9:55 AM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
 is it possible with geofilt and facet.query?

 facet.query={!geofilt pt=45.15,-93.85 sfield=store d=5}

Yes, that should be both possible and faster... something along the lines
of:
sfield=storept=45.15,-93.85
facet.query={!geofilt d=10 key=d10}
facet.query={!geofilt d=20 key=d20}
facet.query={!geofilt d=50 key=d50}

Eventually we should implement range faceting over functions and also
add a max distance you care about to the geodist function.

-Yonik
http://www.lucene-eurocon.com - The Lucene/Solr User Conference


 On Thu, Oct 13, 2011 at 4:20 PM, roySolr royrutten1...@gmail.com
wrote:

 I don't want to use some basic facets. When the user doesn't get any
 results
 i want
 to search in the radius of his search location. Example:

 apple store in Manchester gives no result. I want this:

 Click here to see 2 results in a radius of 10km.
 Click here to see 11 results in a radius of 50km.
 Click here to see 19 results in a radius of 100km.

 With geodist() and facet.query is this possible but the performance
isn't
 very good..


 --
 View this message in context:
 
http://lucene.472066.n3.nabble.com/Performance-issue-Frange-with-geodist
-tp3417962p3418429.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Sincerely yours
 Mikhail (Mike) Khludnev
 Developer
 Grid Dynamics
 tel. 1-415-738-8644
 Skype: mkhludnev
 http://www.griddynamics.com
  mkhlud...@griddynamics.com





Re: Scoring of DisMax in Solr

2011-10-05 Thread Bill Bell
This seems like a bug to me.

On 10/4/11 6:52 PM, David Ryan help...@gmail.com wrote:

Hi,


When I examine the score calculation of DisMax in Solr,   it looks to me
that DisMax is using  tf x idf^2 instead of tf x idf.
Does anyone have insight why tf x idf is not used here?

Here is the score contribution from one one field:

score(q,c) =  queryWeight x fieldWeight
   = tf x idf x idf x queryNorm x fieldNorm

Here is the example that I used to derive the formula above. Clearly, idf
is
multiplied twice in the score calculation.
*
http://localhost:8983/solr/select/?q=GBversion=2.2start=0rows=10indent
=ondebugQuery=truefl=id,score
*

str name=6H500F0
0.18314168 = (MATCH) sum of:
  0.18314168 = (MATCH) weight(text:gb in 1), product of:
0.35845062 = queryWeight(text:gb), product of:
  2.3121865 = idf(docFreq=6, numDocs=26)
  0.15502669 = queryNorm
0.5109258 = (MATCH) fieldWeight(text:gb in 1), product of:
  1.4142135 = tf(termFreq(text:gb)=2)
  2.3121865 = idf(docFreq=6, numDocs=26)
  0.15625 = fieldNorm(field=text, doc=1)
/str


Thanks!




Re: Solr stopword problem in Query

2011-09-26 Thread Bill Bell
This is pretty serious issue

Bill Bell
Sent from mobile


On Sep 26, 2011, at 4:09 AM, Isan Fulia isan.fu...@germinait.com wrote:

 Hi all,
 
 I have a text field named* textForQuery* .
 Following content has been indexed into solr in field textForQuery
 *Coke Studio at MTV*
 
 when i fired the query as
 *textForQuery:(coke studio at mtv)* the results showed 0 documents
 
 After runing the same query in debugMode i got the following results
 
 result name=response numFound=0 start=0/
 lst name=debug
 str name=rawquerystringtextForQuery:(coke studio at mtv)/str
 str name=querystringtextForQuery:(coke studio at mtv)/str
 str name=parsedqueryPhraseQuery(textForQuery:coke studio ? mtv)/str
 str name=parsedquery_toStringtextForQuery:coke studio *? *mtv/str
 
 Why the query did not matched any document even when there is a document
 with value of textForQuery as *Coke Studio at MTV*?
 Is this because of the stopword *at* present in stopwordList?
 
 
 
 -- 
 Thanks  Regards,
 Isan Fulia.


Re: indexing a xml file

2011-09-24 Thread Bill Bell
Send us the example solr.xml and schema.xml'. You are missing fields
in the schema.xml that you are referencing.

On 9/24/11 8:15 AM, ahmad ajiloo ahmad.aji...@gmail.com wrote:

hello
Solr Tutorial page explains about index a xml file. but when I try to
index
a xml file with this command:
~/Desktop/apache-solr-3.3.0/example/exampledocs$ java -jar post.jar
solr.xml
I get this error:
SimplePostTool: FATAL: Solr returned an error #400 ERROR:unknown field
'name'

can anyone help me?
thanks




Best Solr escaping?

2011-09-24 Thread Bill Bell
What is the best algorithm for escaping strings before sending to Solr? Does
someone have some code?

A few things I have witnessed in q using DIH handler
* Double quotes -  that are not balanced can cause several issues from an
error (strip the double quote?), to no results.
* Should we use + or %20 ­ and what cases make sense:
 * Dr. Phil Smith or Dr.+Phil+Smith or Dr.%20Phil%20Smith - also what is
 the impact of double quotes?
* Unmatched parenthesis I.e. Opening ( and not closing.
 * (Dr. Holstein
 * Cardiologist+(Dr. Holstein
Regular encoding of strings does not always work for the whole string due to
several issues like white space:
* White space works better when we use back quote Bill\ Bell especially
when using facets.

Thoughts? Code? Ideas? Better Wikis?





Re: Search query doesn't work in solr/browse pnnel

2011-09-24 Thread Bill Bell
Yes. It appears that  cannot be encoded in the URL or there is really
bad results.
For example we get an error on first request, but if we refresh it goes
away.



On 9/23/11 2:57 PM, hadi md.anb...@gmail.com wrote:

When I create a query like somethingfl=content in solr/browse the 
and
= in URL converted to %26 and %3D and no result occurs. but it works in
solr/admin advanced search and also in URL bar directly, How can I solve
this problem?  Thanks

--
View this message in context:
http://lucene.472066.n3.nabble.com/Search-query-doesn-t-work-in-solr-brows
e-pnnel-tp3363032p3363032.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Distinct elements in a field

2011-09-17 Thread Bill Bell
SOLR-2242 can do it.

On 9/16/11 2:15 AM, swiss knife swiss_kn...@email.com wrote:

I could get this number by using

 group.ngroups=truegroup.limit=0

 but doing grouping for this seems like an overkill

 Would you advise using JIRA SOLR-1814 ?

- Original Message -
From: swiss knife
Sent: 09/15/11 12:43 PM
To: solr-user@lucene.apache.org
Subject: Distinct elements in a field

 Simple question: I want to know how many distinct elements I have in a
field and these verify a query. Do you know if there's a way to do it
today in 3.4. I saw SOLR-1814 and SOLR-2242. SOLR-1814 seems fairly easy
to use. What do you think ? Thank you




Re: Re; DIH Scheduling

2011-09-12 Thread Bill Bell
You can easily use cron with curl to do what you want to do.

On 9/12/11 2:47 PM, Pulkit Singhal pulkitsing...@gmail.com wrote:

I don't see anywhere in:
http://issues.apache.org/jira/browse/SOLR-2305
any statement that shows the code's inclusion was decided against
when did this happen and what is needed from the community before
someone with the powers to do so will actually commit this?

2011/6/24 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com

 On Thu, Jun 23, 2011 at 9:13 PM, simon mtnes...@gmail.com wrote:
  The Wiki page describes a design for a scheduler, which has not been
  committed to Solr yet (I checked). I did see a patch the other day
  (see https://issues.apache.org/jira/browse/SOLR-2305) but it didn't
  look well tested.
 
  I think that you're basically stuck with something like cron at this
  time. If your application is written in java, take a look at the
  Quartz scheduler - http://www.quartz-scheduler.org/

 It was considered and decided against.
 
  -Simon
 



 --
 -
 Noble Paul





Re: pagination with grouping

2011-09-08 Thread Bill Bell
There are 2 use cases:

1. rows=10 means 10 groups.
2. rows=10 means to results (irregardless of groups).

I thought there was a total number of groups (ngroups) or case #1.

I don't believe case #2 has been coded.

On 9/8/11 2:22 PM, alx...@aim.com alx...@aim.com wrote:


 

 Hello,

When trying to implement pagination as in the case without grouping I see
two issues.
1. with rows=10 solr feed displays 10 groups not 10 results
2. there is no total number of results with grouping  to show the last
page.

In detail:
1. I need to display only 10 results in one page. For example if I have
group.limit=5 and the first group has 5 docs, the second 3 and the third
2 then only these 3 group must be displayed in the first page.
Currently specifying rows=10, shows 10 groups and if we have 5 docs in
each group then in the first page we will have 50 docs.

2.I need to show the last page, for which I need total number of results
with grouping. For example if I have 5 groups with number of docs 5, 4,
3,2 1 then this total number must be 15.

Any ideas how to achieve this.

Thanks in advance.
Alex.







Re: Terms.regex performance issue

2011-08-22 Thread Bill Bell
We do something like:

http://localhost:8983/solr/provs/terms?terms.fl=payorterms.regex.flag=case
_insensitiveterms.regex=%28.*%29WHAT USER TYPES%28.*%29terms.limit=-1


We want not just prefix but anywhere in the terms.



On 8/19/11 5:21 PM, Chris Hostetter hossman_luc...@fucit.org wrote:


: Subject: Terms.regex performance issue
: 
: As I want to use it in an Autocomplete it has to be fast. Terms.prefix
gets
: results in around 100 milliseconds, while terms.regex is 10 to 20 times
: slower.

can you elaborate on how you are using terms.regex?  what does your regex
look like? .. particularly if your usecase is autocomplete terms.prefix
seems like an odd choice.

Possible XY Problem?
https://people.apache.org/~hossman/#xyproblem

Have you looked at using the Suggester plugin?

https://wiki.apache.org/solr/Suggester


-Hoss




Re: hierarchical faceting in Solr?

2011-08-22 Thread Bill Bell
Naomi,

Just create a login and update it!!


On 8/22/11 12:27 PM, Erick Erickson erickerick...@gmail.com wrote:

Try searching the Solr user's list for hierarchical, this topic
has been discussed numerous times.

It would be great if you could collate the various solutions
and update the wiki, all you have to do is create a
login...

Best
Erick

On Mon, Aug 22, 2011 at 1:49 PM, Naomi Dushay ndus...@stanford.edu
wrote:
 Chris,

 Is there a document somewhere on how to do this?  If not, might you
create
 one?   I could even imagine such a document living on the Solr wiki ...
  this one has mostly ancient content:

 http://wiki.apache.org/solr/HierarchicalFaceting

 - Naomi





Re: copyField for big indexes

2011-08-22 Thread Bill Bell
It depends.

copyField may be good if you want to copy into a Soundex field, and then
boost the sounded field lower than the tokenized field.

What are you trying to do ?

On 8/22/11 11:14 AM, Tom springmeth...@gmail.com wrote:

Is it a good rule of thumb, that when dealing with large indexes copyField
should not be used.  It seems to duplicate the indexing of data.

You don't need copyField to be able to search on multiple fields.
Example,
if I have two fields: title and post and I want to search on both, I could
just query 
title:word OR post:word

So it seems to me if you have lot's of data and a large indexes, copyField
should be avoided.

Any thoughts?

--
View this message in context:
http://lucene.472066.n3.nabble.com/copyField-for-big-indexes-tp3275712p327
5712.html
Sent from the Solr - User mailing list archive at Nabble.com.




Boost or BQ?

2011-08-22 Thread Bill Bell
What is the different between boost= and bq= ?

I cannot find any documentationŠ




Score

2011-08-15 Thread Bill Bell
How do I change the score to scale it between 0 and 100 irregardless of the 
score? 

q.alt=*:*bq=lang:SpanishdefType=dismax

Bill Bell
Sent from mobile



Loggly support

2011-08-14 Thread Bill Bell
How do you setup log4j to work with Loggly for SOLR logs?

Anyone have this set up?

Bill





Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-14 Thread Bill Bell
OK,

I'll ask the elephant in the roomŠ.

What is the difference between the new UpdateHandler from Mark and the
SOLR-RA?

The UpdateHandler works with 4.0 does SOLR-RA work with 4.0 trunk?

Pros/Cons?


On 8/14/11 8:10 PM, Nagendra Nagarajayya nnagaraja...@transaxtions.com
wrote:

Naveen:

NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a
document to become searchable. Any document that you add through update
becomes  immediately searchable. So no need to commit from within your
update client code.  Since there is no commit, the cache does not have
to be cleared or the old searchers closed or  new searchers opened, and
warmed (error that you are facing).

Regards

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org



On 8/14/2011 10:37 AM, Naveen Gupta wrote:
 Hi Mark/Erick/Nagendra,

 I was not very confident about NRT at that point of time, when we
started
 project almost 1 year ago, definitely i would try NRT and see the
 performance.

 The current requirement was working fine till we were using
commitWithin 10
 millisecs in the XMLDocument which we were posting to SOLR.

 But due to which, we were getting very poor performance (almost 3 mins
for
 15,000 docs) per user. There are many paraller user committing to our
SOLR.

 So we removed the commitWithin, and hence performance was much much
better.

 But then we are getting this maxWarmingSearcher Error, because we are
 committing separately as a curl request after once entire doc is
submitted
 for indexing.

 The question here is what is difference between commitWithin and commit
 (apart from the fact that commit takes memory and processes and
additional
 hardware usage)

 Why we want it to be visible as soon as possible, since we are applying
many
 business rules on top of the results (older indexes as well as new one)
and
 apply different filters.

 upto 5 mins is fine for us. but more than that we need to think then
other
 optimizations.

 We will definitely try NRT. But please tell me other options which we
can
 apply in order to optimize.?

 Thanks
 Naveen


 On Sun, Aug 14, 2011 at 9:42 PM, Erick
Ericksonerickerick...@gmail.comwrote:

 Ah, thanks, Mark... I must have been looking at the wrong JIRAs.

 Erick

 On Sun, Aug 14, 2011 at 10:02 AM, Mark Millermarkrmil...@gmail.com
 wrote:
 On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote:

 You either have to go to near real time (NRT), which is under
 development, but not committed to trunk yet
 NRT support is committed to trunk.

 - Mark Miller
 lucidimagination.com














Re: Cache replication

2011-08-14 Thread Bill Bell
OK. But SOLR has built-in caching. Do you not like the caching? What so
you think we should change to the SOLR cache?

Bill


On 8/10/11 9:16 AM, didier deshommes dfdes...@gmail.com wrote:

Consider putting a cache (memcached, redis, etc) *in front* of your
solr slaves. Just make sure to update it when replication occurs.

didier

On Tue, Aug 9, 2011 at 6:07 PM, arian487 akarb...@tagged.com wrote:
 I'm wondering if the caches on all the slaves are replicated across
(such as
 queryResultCache).  That is to say, if I hit one of my slaves and cache
a
 result, and I make a search later and that search happens to hit a
different
 slave, will that first cached result be available for use?

 This is pretty important because I'm going to have a lot of slaves and
if
 this isn't done, then I'd have a high chance of running a lot uncached
 queries.

 Thanks :)

 --
 View this message in context:
http://lucene.472066.n3.nabble.com/Cache-replication-tp3240708p3240708.ht
ml
 Sent from the Solr - User mailing list archive at Nabble.com.





Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-14 Thread Bill Bell
I understand.

Have you looked at Mark's patch? From his performance tests, it looks
pretty good.

When would RA work better?

Bill


On 8/14/11 8:40 PM, Nagendra Nagarajayya nnagaraja...@transaxtions.com
wrote:

Bill:

The technical details of the NRT implementation in Apache Solr with
RankingAlgorithm (SOLR-RA) is available here:

http://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf

(Some changes for Solr 3.x, but for most it is as above)

Regarding support for 4.0 trunk, should happen sometime soon.

Regards

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org





On 8/14/2011 7:11 PM, Bill Bell wrote:
 OK,

 I'll ask the elephant in the roomŠ.

 What is the difference between the new UpdateHandler from Mark and the
 SOLR-RA?

 The UpdateHandler works with 4.0 does SOLR-RA work with 4.0 trunk?

 Pros/Cons?


 On 8/14/11 8:10 PM, Nagendra
Nagarajayyannagaraja...@transaxtions.com
 wrote:

 Naveen:

 NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a
 document to become searchable. Any document that you add through update
 becomes  immediately searchable. So no need to commit from within your
 update client code.  Since there is no commit, the cache does not have
 to be cleared or the old searchers closed or  new searchers opened, and
 warmed (error that you are facing).

 Regards

 - Nagendra Nagarajayya
 http://solr-ra.tgels.org
 http://rankingalgorithm.tgels.org



 On 8/14/2011 10:37 AM, Naveen Gupta wrote:
 Hi Mark/Erick/Nagendra,

 I was not very confident about NRT at that point of time, when we
 started
 project almost 1 year ago, definitely i would try NRT and see the
 performance.

 The current requirement was working fine till we were using
 commitWithin 10
 millisecs in the XMLDocument which we were posting to SOLR.

 But due to which, we were getting very poor performance (almost 3 mins
 for
 15,000 docs) per user. There are many paraller user committing to our
 SOLR.

 So we removed the commitWithin, and hence performance was much much
 better.

 But then we are getting this maxWarmingSearcher Error, because we are
 committing separately as a curl request after once entire doc is
 submitted
 for indexing.

 The question here is what is difference between commitWithin and
commit
 (apart from the fact that commit takes memory and processes and
 additional
 hardware usage)

 Why we want it to be visible as soon as possible, since we are
applying
 many
 business rules on top of the results (older indexes as well as new
one)
 and
 apply different filters.

 upto 5 mins is fine for us. but more than that we need to think then
 other
 optimizations.

 We will definitely try NRT. But please tell me other options which we
 can
 apply in order to optimize.?

 Thanks
 Naveen


 On Sun, Aug 14, 2011 at 9:42 PM, Erick
 Ericksonerickerick...@gmail.comwrote:

 Ah, thanks, Mark... I must have been looking at the wrong JIRAs.

 Erick

 On Sun, Aug 14, 2011 at 10:02 AM, Mark Millermarkrmil...@gmail.com
 wrote:
 On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote:

 You either have to go to near real time (NRT), which is under
 development, but not committed to trunk yet
 NRT support is committed to trunk.

 - Mark Miller
 lucidimagination.com

















Re: SOLR 3.3.0 multivalued field sort problem

2011-08-13 Thread Bill Bell
I have a different use case. Consider a spatial multivalued field with latlong 
values for addresses. I would want sort by geodist() to return the closest 
distance in each group. For example find me the closest restaurant which each 
doc being a chain name like pizza hut. Or doctors with multiple offices.

Bill Bell
Sent from mobile


On Aug 13, 2011, at 12:31 PM, Martijn v Groningen martijn.is.h...@gmail.com 
wrote:

 The first solution would make sense to me. Some kind of a strategy
 mechanism
 for this would allow anyone to define their own rules. Duplicating results
 would be confusing to me.
 
 On 13 August 2011 18:39, Michael Lackhoff mich...@lackhoff.de wrote:
 
 On 13.08.2011 18:03 Erick Erickson wrote:
 
 The problem I've always had is that I don't quite know what
 sorting on multivalued fields means. If your field had tokens
 a and z, would sorting on that field put the doc
 at the beginning or end of the list? Sure, you can define
 rules (first token, last token, average of all tokens (whatever
 that means)), but each solution would be wrong sometime,
 somewhere, and/or completely useless.
 
 Of course it would need rules but I think it wouldn't be too hard to
 find rules that are at least far better than the current situation.
 
 My wish would include an option that decides if the field can be used
 just once or every value on its own. If the option is set to FALSE, only
 the first value would be used, if it is TRUE, every value of the field
 would get its place in the result list.
 
 so, if we have e.g.
 record1: ccc and bbb
 record2: aaa and zzz
 it would be either
 record2 (aaa)
 record1 (ccc)
 or
 record2 (aaa)
 record1 (bbb)
 record1 (ccc)
 record2 (zzz)
 
 I find these two outcomes most plausible so I would allow them if
 technical possible but whatever rule looks more plausible to the
 experts: some solution is better than no solution.
 
 -Michael
 
 
 
 
 -- 
 Met vriendelijke groet,
 
 Martijn van Groningen


Re: ideas for indexing large amount of pdf docs

2011-08-13 Thread Bill Bell
You could send PDF for processing using a queue solution like Amazon SQS. Kick 
off Amazon instances to process the queue.

Once you process with Tika to text just send the update to Solr.

Bill Bell
Sent from mobile


On Aug 13, 2011, at 10:13 AM, Erick Erickson erickerick...@gmail.com wrote:

 Yeah, parsing PDF files can be pretty resource-intensive, so one solution
 is to offload it somewhere else. You can use the Tika libraries in SolrJ
 to parse the PDFs on as many clients as you want, just transmitting the
 results to Solr for indexing.
 
 HOw are all these docs being submitted? Is this some kind of
 on-the-fly indexing/searching or what? I'm mostly curious what
 your projected max ingestion rate is...
 
 Best
 Erick
 
 On Sat, Aug 13, 2011 at 4:49 AM, Rode Gonzalez (libnova)
 r...@libnova.es wrote:
 Hi all,
 
 I want to ask about the best way to implement a solution for indexing a
 large amount of pdf documents between 10-60 MB each one. 100 to 1000 users
 connected simultaneously.
 
 I actually have 1 core of solr 3.3.0 and it works fine for a few number of
 pdf docs but I'm afraid about the moment when we enter in production time.
 
 some possibilities:
 
 i. clustering. I have no experience in this, so it will be a bad idea to
 venture into this.
 
 ii. multicore solution. make some kind of hash to choose one core at each
 query (exact queries) and thus reduce the size of the individual indexes to
 consult or to consult all the cores at same time (complex queries).
 
 iii. do nothing more and wait for the catastrophe in the response times :P
 
 
 Someone with experience can help a bit to decide?
 
 Thanks a lot in advance.
 


Re: Problem with xinclude in solrconfig.xml

2011-08-13 Thread Bill Bell
What was it?

Bill Bell
Sent from mobile


On Aug 10, 2011, at 2:21 PM, Way Cool way1.wayc...@gmail.com wrote:

 Sorry for the spam. I just figured it out. Thanks.
 
 On Wed, Aug 10, 2011 at 2:17 PM, Way Cool way1.wayc...@gmail.com wrote:
 
 Hi, Guys,
 
 Based on the document below, I should be able to include a file under the
 same directory by specifying relative path via xinclude in solrconfig.xml:
 http://wiki.apache.org/solr/SolrConfigXml
 
 However I am getting the following error when I use relative path (absolute
 path works fine though):
 SEVERE: org.xml.sax.SAXParseException: Error attempting to parse XML file
 
 Any ideas?
 
 Thanks,
 
 YH
 


Re: getting result count only

2011-08-06 Thread Bill Bell
q=*:*rows=0



On 8/6/11 8:24 PM, Jason Toy jason...@gmail.com wrote:

How can I run a query to get the result count only? I only need the count
and so I dont need solr to send me all the results back.




Re: Problem with making Solr query

2011-08-05 Thread Bill Bell
String does no manipulation. You might need to switch the fieldtype. Also make 
sure your default field is title or add title:implementation in your search. 

Bill Bell
Sent from mobile


On Aug 5, 2011, at 8:43 AM, dhryvastov dhryvas...@serena.com wrote:

 Hi -
 
 I am new to Solr and Lucene and I have started to research its capabilities
 this week. My current task seems very simple (and I believe it is) but I
 have some issue.
 
 I have successfully done indexing of MSSQL database table. The table has
 probably 20 fields and I have indexed two of them: id and title.
 The question is: how can I get all records from this table (I mean the id's
 of them) were the word specifies in search appears???
 
 I send the following get request to get result:
 http://localhost:8983/solr/db/select/?q=implementation. The response
 contains 0 results (numFound=0) but there are at least 5 records among the
 first 10 which contains this word in its title.
 
 My schema.xml contains:
 fields
   field name=id type=string indexed=true stored=true
 required=true / 
   field name=title type=string indexed=true stored=true
 required=true / 
 /fields
 
 What get request should I do to get the expected results?
 
 I feel that I have omitted something simple but it is the second day that I
 can't found what.
 Please help.
 
 Thanks for your response.
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Problem-with-making-Solr-query-tp3228877p3228877.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Spatial Search and Highlighting

2011-08-01 Thread Bill Bell
I think 4.0 supports fl=geodist()

On 8/1/11 3:47 PM, Ralf Musick ra...@gmx.de wrote:

Hi David,

So that As a temporary workaround for older Solr versions, it's
possible to obtain distances by using geodist or geofilt as the only
scoring part of the main query
and Highlighting do not fit together, right?

Ok, than I have to calculate the distance by my own.

Thank you very much for your given information!!

Best regards,
  Ralf



Am 01.08.2011 23:30, schrieb Smiley, David W.:
 Ralf,

 Highlighting (and search relevancy -- the score) is performed on the
user query which must be in the q parameter.  In your case, I see you
placed your geospatial query there and you put your user query into a
filter query fq.  You have them reversed.

 You stated that the returning the distance information on the wiki
didn't work -- that's because those instructions are for Solr 4.0 (not
released yet) -- notice the warning symbol. I recommend that you
calculate the distance yourself since Solr 4.0 isn't out yet. There is
plenty of information on the web on how to calculate the distance
between two lat-lon points using the Haversine algorithm.

 ~ David

 On Aug 1, 2011, at 5:00 PM, Ralf Musick wrote:

 Hi David,

 an example is:
 
http://localhost:8983/solr/browse?indent=onhl=onhl.fl=name,manusort=s
core+ascsfield=storejson.nl=mapwt=jsonrows=10start=0q={!func}geodi
st%28%29pt=45.17614%2C-93.87341fq=%28name%20:+%28canon%29%29^8

 I have to say I need the calculated distance as a return field (score)
 in the result list.
 The pseudo field solution described here
 http://wiki.apache.org/solr/SpatialSearch#Returning_the_distance did
not
 word, so I created the query above.

 Thanks,
   Ralf


 Am 01.08.2011 22:21, schrieb Smiley, David W.:
 Can you demonstrate the bug against the example data?  If so, provide
the URL please.
 ~ David

 On Aug 1, 2011, at 4:00 PM, Ralf Musick wrote:

 Hi,

 I combined a spatial distance search with a fulltext search as
described
 in
 
http://wiki.apache.org/solr/SpatialSearch#geodist_-_The_distance_funct
ion .
 I'm using solr 3.3 and that works fine.
 BUT, I want to use highlighting of fulltext query words but that does
 not work.

 Before solr 3.3, I used solr 1.4 with Spatial Search plugin from
Jteam
 and that works fine also with highlighting.

 After refactoring because of API change I miss the highlighting
feature.

 Is that a known issue? Or what is my mistake/ I have to do?

 Example Query:
 INFO: [organisations] webapp=/solr path=/select
 
params={hl.fragsize=250sort=score+ascsfield=store_lat_lonjson.nl=ma
phl.fl=name,category_namewt=jsonhl=onrows=10fl=id,name,street,cit
y,score,lat,lngstart=0q={!func}geodist()pt=52.5600917,13.4222482fq
=((country_name:+(automatisierung))^8+OR+(category_name:+(automatisier
ung))^10+OR+(sub_category_name:+(automatisierung))^10}
 hits=37 status=0 QTime=7


 Thanks is Advance,
   Ralf






Re: sort by function

2011-08-01 Thread Bill Bell
This seems like a bug.

On 8/1/11 7:47 AM, Jamie Johnson jej2...@gmail.com wrote:

I've never tried but could it be sort=sum(field1,field2,field3)%20desc

On Mon, Aug 1, 2011 at 9:43 AM, Gastone Penzo gastone.pe...@gmail.com
wrote:
 Hi,
 i need to order by function like:

 sort=sum(field1,field2,field3)+desc

 but solr gives me this error:
 Missing sort order.
 why is this possible? i read that is possible to order by function,
from version 1.3
 (http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function)

 i use version 1.4

 nobody has an idea?

 thanx


 Gastone




Re: How can i find a document by a special id?

2011-07-20 Thread Bill Bell
Why not just search the 2 fields?

q=*:*fq=mediacode:AB OR id:123456

You could take the user input and replace it:

q=*:*fq=mediacode:$input OR id:$input

Of course you can also use dismax and wrap with an OR.

Bill Bell
Sent from mobile


On Jul 20, 2011, at 3:38 PM, Chris Hostetter hossman_luc...@fucit.org wrote:

 
 : Am 20.07.2011 19:23, schrieb Kyle Lee:
 :  Is the mediacode always alphabetic, and is the ID always numeric?
 :  
 : No sadly not. We expose our products on too many medias :-).
 
 If i'm understanding you correctly, you're saying even the prefix AB is 
 not special, that there could be any number of prefixes identifying 
 differnet mediacodes ? and the product ids aren't all numeric?
 
 your question seems  absurd.  
 
 I can only assume that I am horribly missunderstanding your situation.  
 (which is very easy to do when you only have a single contrieved piece of 
 example data to go on)
 
 As a general rule, it's not a good idea to think about Solr in the same 
 way as a relational database, but Perhaps if you imagine for a moment that 
 your Solr index *was* a (read only) relational database, with each 
 solr field corrisponding to a column in your DB, and then you described in 
 psuedo-code/sql how you would go about doing the types of id lookups you 
 want to do, it might give us a better idea of your situation so we can 
 suggest an approach for dealing with it.
 
 
 -Hoss


Re: Data Import from a Queue

2011-07-20 Thread Bill Bell
Yes this is a good reason for using a queue. I have used Amazon SQS this way 
and it was simple to set up.

Bill Bell
Sent from mobile


On Jul 20, 2011, at 2:59 AM, Stefan Matheis matheis.ste...@googlemail.com 
wrote:

 Brandon,
 
 i don't know how they are using it in detail, but Part of Chef's Architecture 
 is this one:
 
 Chef Server - RabbitMQ - Chef Solr Indexer - Solr
 http://wiki.opscode.com/download/attachments/7274878/chef-server-arch.png
 
 Perhaps not exactly, what you're looking for - but may give you an idea?
 
 Regards
 Stefan
 
 Am 19.07.2011 19:04, schrieb Brandon Fish:
 Let me provide some more details to the question:
 
 I was unable to find any example implementations where individual documents
 (single document per message) are read from a message queue (like ActiveMQ
 or RabbitMQ) and then added to Solr via SolrJ, a HTTP POST or another
 method. Does anyone know of any available examples for this type of import?
 
 If no examples exist, what would be a recommended commit strategy for
 performance? My best guess for this would be to have a queue per core and
 commit once the queue is empty.
 
 Thanks.
 
 On Mon, Jul 18, 2011 at 6:52 PM, Erick 
 Ericksonerickerick...@gmail.comwrote:
 
 This is a really cryptic problem statement.
 
 you might want to review:
 
 http://wiki.apache.org/solr/UsingMailingLists
 
 Best
 Erick
 
 On Fri, Jul 15, 2011 at 1:52 PM, Brandon Fishbrandon.j.f...@gmail.com
 wrote:
 Does anyone know of any existing examples of importing data from a queue
 into Solr?
 
 Thank you.
 
 
 


Re: configure dismax requesthandlar for boost a field

2011-07-05 Thread Bill Bell
Add score to the fl parameter.

fl=*,score


On 7/4/11 11:09 PM, Romi romijain3...@gmail.com wrote:

I am not returning score for the queries. as i suppose it should be
reflected
in search results. means doc having query string in description field come
higher than the doc having query string in name field.

And yes i restarted solr after making changes in configuration.

-
Thanks  Regards
Romi
--
View this message in context:
http://lucene.472066.n3.nabble.com/configure-dismax-requesthandlar-for-boo
st-a-field-tp3137239p3139680.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Exception when using result grouping and sorting by geodist() with Solr 3.3

2011-07-05 Thread Bill Bell
Did you add: fq={!geofilt} ??

On 7/3/11 11:14 AM, Thomas Heigl tho...@umschalt.com wrote:

Hello,

I just tried up(down?)grading our current Solr 4.0 trunk setup to Solr
3.3.0
as result grouping was the only reason for us to stay with the trunk.
Everything worked like a charm except for one of our queries, where we
group
results by the owning user and sort by distance.

A simplified example for my query (that still fails) looks like this:

q=*:*group=truegroup.field=user.uniqueId_sgroup.main=truegroup.format=
groupedsfield=user.location_ppt=48.20927,16.3728sort=geodist()
 asc


The exception thrown is:

Caused by: org.apache.solr.common.SolrException: Unweighted use of sort
 geodist(latlon(user.location_p),48.20927,16.3728)
 at
 
org.apache.solr.search.function.ValueSource$1.newComparator(ValueSource.j
ava:106)
 at org.apache.lucene.search.SortField.getComparator(SortField.java:413)
 at
 
org.apache.lucene.search.grouping.AbstractFirstPassGroupingCollector.ini
t(AbstractFirstPassGroupingCollector.java:81)
 at
 
org.apache.lucene.search.grouping.TermFirstPassGroupingCollector.init(T
ermFirstPassGroupingCollector.java:56)
 at
 
org.apache.solr.search.Grouping$CommandField.createFirstPassCollector(Gro
uping.java:587)
 at org.apache.solr.search.Grouping.execute(Grouping.java:256)
 at
 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.j
ava:237)
 at
 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchH
andler.java:194)
 at
 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa
se.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
 at
 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedded
SolrServer.java:140)
 ... 39 more


Any ideas how to fix this or work around this error for now? I'd really
like
to move from the trunk to the stable 3.3.0 release and this is the only
problem currently keeping me from doing so.

Cheers,

Thomas




Re: faceting on field with two values

2011-07-05 Thread Bill Bell
The easiest way is to concat() the fields in SQL, and pass it to indexing
as one field already merged together.

Thanks,

On 7/5/11 1:12 AM, elisabeth benoit elisaelisael...@gmail.com wrote:

Hello,

I have two fields TOWN and POSTALCODE and I want to concat those two in
one
field to do faceting

My two fields  are declared as followed:

field name=TOWN type=string indexed=true stored=true/
field name=POSTALCODE type=string indexed=true stored=true/

The concat field is declared as followed:

field name=TOWN_POSTALCODE type=string indexed=true stored=true
multiValued=true/

and I do the copyfield as followed:

   copyField source=TOWN dest=TOWN_POSTALCODE/
   copyField source=POSTALCODE dest=TOWN_POSTALCODE/


When I do faceting on TOWN_POSTALCODE field, I only get answers like

lst name=TOWN_POSTALCODE
int name=622005/int
int name=622805/int
int name=boulogne sur mer5/int
int name=saint martin boulogne5/int
...

Which means the faceting is down on the TOWN part or the POSTALCODE part
of
TOWN_POSTALCODE.

But I would like to have answers like

lst name=TOWN_POSTALCODE
int name=boulogne sur mer 622005/int
int name=paris 750165/int

Is this possible with Solr?

Thanks,
Elisabeth




Re: How many fields can SOLR handle?

2011-07-05 Thread Bill Bell
This is taxonomy/index design...

One way is to have a series of fields by category:

TV - tv_size, resolution
Computer - cpu, gpu

Solr can have as many fields as you need, and if you do not store them
into the index they are ignored.

So if a user picks TV, you pass these to Solr:

q=*:*facet=truefacet.field=tv_sizefacet.field=resolution

If a user picks Computer, you pass these to Solr:

q=*:*facet=truefacet.field=cpufacet.field=gpu

The other option is to return ALL of the fields facet'd, but this is not
recommended since
you would certainly have performance issues depending on the number of
fields.





On 7/5/11 1:00 AM, roySolr royrutten1...@gmail.com wrote:

Hi,

I know i can add components to my requesthandler. In this situation facets
are dependent of there category. So if a user choose for the category TV:

Inch:
32 inch(5)
34 inch(3)
40 inch(1)

Resolution:
Full HD(5)
HD ready(2)

When a user search for category Computer:

CPU:
Intel(12)
AMD(10)

GPU:
Ati(5)
Nvidia(2)

So i can't put it in my requesthandler as default. Every search there can
be
other facets. Do you understand what i mean?

--
View this message in context:
http://lucene.472066.n3.nabble.com/How-many-fields-can-SOLR-handle-tp30339
10p3139833.html
Sent from the Solr - User mailing list archive at Nabble.com.




  1   2   >