Re: System requirements in my case?

2012-05-22 Thread findbestopensource
Dedicated Server may not be required. If you want to cut down cost, then
prefer shared server.

How much the RAM?

Regards
Aditya
www.findbestopensource.com


On Tue, May 22, 2012 at 12:36 PM, Bruno Mannina bmann...@free.fr wrote:

 Dear Solr users,

 My company would like to use solr to index around 80 000 000 documents
 (xml files with around 5~10ko size each).
 My program (robot) will connect to this solr with boolean requests.

 Number of users: around 1000
 Number of requests by user and by day: 300
 Number of users by day: 30

 I would like to subscribe to a host provider with this configuration:
 - Dedicated Server
 - Ubuntu
 - Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go
 - Unlimited bandwidth
 - IP fixe

 Do you think this configuration is enough?

 Thanks for your info,
 Sincerely
 Bruno



Re: Strategy for maintaining De-normalized indexes

2012-05-22 Thread findbestopensource
Thats how de-normalization works. You need to update all child products.

If you just need the count and you are using facets then maintain a map
between category and main product, main product and child product. Lucene
db has no schema. You could retrieve the data based on its type.

Category record will have Category name, ProductName and a type
(CATEGORY_TYPE)
Child product record will have ProductName, MainProductName ProductDetails,
and type (PRODUCT_TYPE)

Now in this you may need to use two queries. Given the category name, fetch
the main product name and query using it to fetch the child products. Hope
it helps.

Regards
Aditya
www.findbestopensource.com


On Tue, May 22, 2012 at 1:37 PM, Sohail Aboobaker sabooba...@gmail.comwrote:

 Hi,

 I have a very basic question and hopefully there is a simple answer to
 this. We are trying to index a simple product catalog which has a master
 product and child products. Each master product can have multiple child
 products. A master product can be assigned one or more product categories.
 Now, we need to be able to show counts of categories based on number of
 child products in each category. We have indexed data using a join and
 selecting appropriate values for index from each table. This is basically a
 De-normalized result set. It works perfectly for our search purposes.
 However, maintaining the index and keeping index up to date is an issue.
 Whenever a product master is updated with a new category, we will need to
 delete all the index entries for child products in index and insert them
 again. This seems a lot of activity for a regular on-going operation i.e.
 product category updates.

 Since, join between schemas is only available in 4.0, what are other
 strategies to maintain or to create such queries.

 Thanks for your help.

 Regards,
 Sohail



Re: Multicore Solr

2012-05-22 Thread findbestopensource
Having cores per user is not good idea. The count is too high. Keep
everything in single core. You could filter the data based on user name or
user id.

Regards
Aditya
www.findbestopensource.com



On Tue, May 22, 2012 at 2:29 PM, Shanu Jha shanuu@gmail.com wrote:

 Hi all,

 greetings from my end. This is my first post on this mailing list. I have
 few questions on multicore solr. For background we want to create a core
 for each user logged in to our application. In that case it may be 50, 100,
 1000, N-numbers. Each core will be used to write and search index in real
 time.

 1. Is this a good idea to go with?
 2. What are the pros and cons of this approch?

 Awaiting for your response.

 Regards
 AJ



Re: System requirements in my case?

2012-05-22 Thread findbestopensource
Seems to be fine. Go head.

Before hosting, Have you tried / tested your application in local setup.
RAM usage is what matters in terms of Solr. Just benchmark your app for 100
000 documents, Log the memory used. Calculate the RAM reqd for 80 000 000
documents.

Regards
Aditya
www.findbestopensource.com


On Tue, May 22, 2012 at 2:36 PM, Bruno Mannina bmann...@free.fr wrote:

 My choice: 
 http://www.ovh.com/fr/**serveurs_dedies/eg_best_of.xmlhttp://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml

 24 Go DDR3

 Le 22/05/2012 10:26, findbestopensource a écrit :

  Dedicated Server may not be required. If you want to cut down cost, then
 prefer shared server.

 How much the RAM?

 Regards
 Aditya
 www.findbestopensource.com


 On Tue, May 22, 2012 at 12:36 PM, Bruno Manninabmann...@free.fr  wrote:

  Dear Solr users,

 My company would like to use solr to index around 80 000 000 documents
 (xml files with around 5~10ko size each).
 My program (robot) will connect to this solr with boolean requests.

 Number of users: around 1000
 Number of requests by user and by day: 300
 Number of users by day: 30

 I would like to subscribe to a host provider with this configuration:
 - Dedicated Server
 - Ubuntu
 - Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go
 - Unlimited bandwidth
 - IP fixe

 Do you think this configuration is enough?

 Thanks for your info,
 Sincerely
 Bruno





Re: is commit a sequential process in solr indexing

2012-05-22 Thread findbestopensource
Yes. Lucene / Solr supports multi threaded environment. You could do commit
from two different threads to same core or different core.

Regards
Aditya
www.findbestopensource.com

On Tue, May 22, 2012 at 12:35 AM, jame vaalet jamevaa...@gmail.com wrote:

 hi,
 my use case here is to search all the incoming documents for certain
 comination of words which are pre-determined. So what am doing here is,
 create a batch of x docs according to their creation date, index them,
 commit them and search them for query (pre-determined).
 My question is, if i have to make the entire process multi threaded and two
 threads are trying to commit two different set of batchs, will the commit
 happen in parallel. what if am trying to commit to different solr-cores ?

 --

 -JAME



Re: Fault tolerant Solr replication architecture

2012-05-21 Thread findbestopensource
Hi Parvin,

Fault tolerant architecture is something you need to decide on your
requirement. At some point of time there may require some manual
intervention to recover from crash. You need to see how much percentage you
could support fault tolerant. It certainly may not be 100. We could handle
situation of network failure but hard to handle situation of crashes.

Consider you have one master and two slaves. You could have load balancer
between slaves, so that you could do round-robin or fail-over between
slaves. If you are not using load balancer then you should handle this in
your application.

If the master crashes, then you may need to rebuild the index. Chances are
less likely.

Regards
Aditya
www.findbestopensource.com



On Mon, May 21, 2012 at 12:55 PM, Parvin Gasimzade 
parvin.gasimz...@gmail.com wrote:

 Hi,

 I am using solr with replication. I have one master that indexes data and
 two slaves which pulls index from master and responds to the queries.

 My question is, how can i create fault tolerant architecture? I mean what
 should i do when master server crashes? I heard that repeater is used for
 this type of architecture. Then, do I have to create one master, one slave
 with repeater and one slave?

 Another question is, if master crashes then does slave with repeater start
 indexing authomatically or should i configure it manually?

 I asked similar question on the stackoverflow :

 http://stackoverflow.com/questions/10597053/fault-tolerant-solr-replication-architecture

 Any help will be appreciated.

 Regards,
 Parvin



Re: curl or nutch

2012-05-16 Thread findbestopensource
You could very well use Solr. It has support to index the PDF and XML
files. If you want to index websites and search using page rank then choose
Nutch.

Regards
Aditya
www.findbestopensource.com


On Wed, May 16, 2012 at 1:13 PM, Tolga to...@ozses.net wrote:

 Hi,

 I have been trying for a week. I really want to get a start, so what
 should I use? curl or nutch? I want to be able to index pdf, xml etc. and
 search within them as well.

 Regards,



Re: authentication for solr admin page?

2012-05-15 Thread findbestopensource
I have written an article on this. The various steps to restrict /
authenticate Solr admin interface.

http://www.findbestopensource.com/article-detail/restrict-solr-admin-access

Regards
Aditya
www.findbestopensource.com


On Thu, Mar 29, 2012 at 1:06 AM, geeky2 gee...@hotmail.com wrote:

 update -

 ok - i was reading about replication here:

 http://wiki.apache.org/solr/SolrReplication

 and noticed comments in the solrconfig.xml file related to HTTP Basic
 Authentication and the usage of the following tags:

 str name=httpBasicAuthUserusername/str
str name=httpBasicAuthPasswordpassword/str

 *Can i place these tags in the request handler to achieve an authentication
 scheme for the /admin page?*

 // snipped from the solrconfig.xml file

  requestHandler name=/admin/
 class=org.apache.solr.handler.admin.AdminHandlers/

 thanks for any help
 mark

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/authentication-for-solr-admin-page-tp3865665p3865747.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Large data set or data corpus

2012-01-11 Thread findbestopensource
Hello all,

Recently i saw couple of discussions in LinkedIn group about generating
large data set or data corpus. I have compiled the same in to an article.
Hope it would be helpful. If you have any other links where we could get
large data set for free, please reply to this mail thread, i will update my
article.

http://www.findbestopensource.com/article-detail/free-large-data-corpus

Regards
Aditya
www.findbestopensource.com


Re: Search Issue

2012-01-11 Thread findbestopensource
While indexing @ is removed. You need to use your own Tokenizer which will
consider @rohit as one word.

Another option is to break the tweet in to two fields, @username and the
tweet. Index both the fields but don't use any tokenizer for the field 
@username. Just index as it is. While querying you need to search for
both the fields. This method will help to fetch tweets of the particular
user.

Regards
Aditya
www.findbestopensource.com

On Wed, Jan 11, 2012 at 3:50 PM, Rohit ro...@in-rev.com wrote:

 Hi,



 We are storing a large number of tweets and blogs feeds into solr.



 Now if the user searches for twitter mentions like, @rohit , records which
 just contain the word rohit are also being returned. Even if we do an exact
 match @rohit, I understand this happens because of use of
 WordDelimiterFilterFactory which splits on special charaters,




 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimit
 erFilterFactory



 How can I force Solr to not return without @? Hope I am being clear.







 Regards,

 Rohit






Re: Thoughts on Search Analytics?

2011-05-06 Thread findbestopensource
1. Reports based on Location. Group by City / Country
2. Total search performed per hour / week / month
3. Frequently used search keywords
4. Analytics based on search keywords.

Regards
Aditya
www.findbestopensource.com


On Fri, May 6, 2011 at 3:55 AM, Otis Gospodnetic otis_gospodne...@yahoo.com
 wrote:

 Hi,

 I'd like to solicit your thoughts about Search Analytics if you are  doing
 any
 sort of analysis/reporting of search logs or click stream or  anything
 related.

 * Which information or reports do you find the most useful and why?
 * Which reports would you like to have, but don't have for whatever  reason
 (don't have the needed data, or it's too hard to produce such  reports, or
 ...)
 * Which tool(s) or service(s) do you use and find the most useful?

 I'm preparing a presentation on the topic of Search Analytics, so I'm
  trying to

 solicit opinions, practices, desires, etc. on this topic.

 Your thoughts would be greatly appreciated.  If you could reply  directly,
 that
 would be great, since this may be a bit OT for the list.

 Thanks!
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



Re: How can i use Solr based Search Engine for My University?

2011-05-06 Thread findbestopensource
Hello Anurag

Google is always there to do internet search. You need to support search for
your university. My opinion would be don't crawl the sites. You require only
Solr and not Nutch.

1. Provide an interface to upload the documents by the university students.
The documents could be previous year question paper, Notes, E-books etc.
Scan the documents and convert it to PDF and upload them. Providing search
on these things would be more valuable than crawling the sites.

Regards
Aditya
www.findbestopensource.com



On Fri, May 6, 2011 at 1:31 PM, Anurag anurag.it.jo...@gmail.com wrote:

 I am a student at  http://jmi.ac.in/index.htm Jamia Millia Islamia  , a
 central univeristy in India. I want to use my search engine for the benefit
 of students. The university has course like undergraduate,graduate,phd etc
 inlcuding Engineering . Earlier one of my teacher suggested developing
 Intranet Search ( for Lan) , but i am not able to figure it out as to how
 to
 implement it. My university uses Google as its own site search tool.

 I am in Engg department and i see students( including me ) using Xerox,
 Previous year papers , Notes  etc  during exam  time. People use internet
 or
 say google to learn if any topics is not inlucded in book.

 Please give some valuable suggestions.

 Thanks

 -
 Kumar Anurag

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2907168.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Is it possible to use sub-fields or multivalued fields for boosting?

2011-05-05 Thread findbestopensource
Hello deniz,

You could create a new field say FullName which is a copyfield of
firstname and surname. Search on both the new field and location but boost
up the new field query.

Regards
Aditya
www.findbestopensource.com



On Thu, May 5, 2011 at 9:21 AM, deniz denizdurmu...@gmail.com wrote:

 okay... let me make the situation more clear... I am trying to create an
 universal field which includes information about users like firstname,
 surname, gender, location etc. When I enter something e.g London, I would
 like to match any users having 'London' in any field firstname, surname or
 location. But if it matches name or surname, I would like to give a higher
 weight.

 so my question is... is it possible to have sub-fields? like
 field name=universal
   field name=firstnameblabla/field
   field name=surnameblabla/field
   field name=genderblabla/field
   field name=locationblabla/field
 /field

 or any other ideas for implementing such feature?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-sub-fields-or-multivalued-fields-for-boosting-tp2901992p2901992.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread findbestopensource
Hello Dominique Bejean,

Good job.

We identified almost 8 open source web crawlers
http://www.findbestopensource.com/tagged/webcrawler   I don't know how far
yours would be different from the rest.

Your license states that it is not open source but it is free for personnel
use.

Regards
Aditya
www.findbestopensource.com


On Wed, Mar 2, 2011 at 5:55 AM, Dominique Bejean
dominique.bej...@eolya.frwrote:

 Hi,

 I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java Web
 Crawler. It includes :

   * a crawler
   * a document processing pipeline
   * a solr indexer

 The crawler has a web administration in order to manage web sites to be
 crawled. Each web site crawl is configured with a lot of possible parameters
 (no all mandatory) :

   * number of simultaneous items crawled by site
   * recrawl period rules based on item type (html, PDF, …)
   * item type inclusion / exclusion rules
   * item path inclusion / exclusion / strategy rules
   * max depth
   * web site authentication
   * language
   * country
   * tags
   * collections
   * ...

 The pileline includes various ready to use stages (text extraction,
 language detection, Solr ready to index xml writer, ...).

 All is very configurable and extendible either by scripting or java coding.

 With scripting technology, you can help the crawler to handle javascript
 links or help the pipeline to extract relevant title and cleanup the html
 pages (remove menus, header, footers, ..)

 With java coding, you can develop your own pipeline stage stage

 The Crawl Anywhere web site provides good explanations and screen shots.
 All is documented in a wiki.

 The current version is 1.1.4. You can download and try it out from here :
 www.crawl-anywhere.com


 Regards

 Dominique




Re: Does Solr supports indexing search for Hebrew.

2011-01-18 Thread findbestopensource
You may need to use Hebrew analyzer.

http://www.findbestopensource.com/search/?query=hebrew

Regards
Aditya
www.findbestopensource.com


On Tue, Jan 18, 2011 at 2:34 PM, prasad deshpande 
prasad.deshpand...@gmail.com wrote:

 Hello,

 With reference to below links I haven't found Hebrew support in Solr.

 http://wiki.apache.org/solr/LanguageAnalysis

 http://lucene.apache.org/java/3_0_3/api/all/index.html

 If I want to index and search Hebrew files/data then how would I achieve
 this?

 Thanks,
 Prasad



Re: Spatial Search - Best choice ?

2010-07-15 Thread findbestopensource
Some more pointers to spatial search,

http://www.jteam.nl/products/spatialsolrplugin.html
http://code.google.com/p/spatial-search-lucene/
http://sujitpal.blogspot.com/2008/02/spatial-search-with-lucene.html

Regards
Aditya
www.findbestopensource.com



On Thu, Jul 15, 2010 at 3:54 PM, Saïd Radhouani r.steve@gmail.comwrote:

 Hi,

 Using Solr 1.4, I'm now working on adding spatial search options, such as
 distance-based sorting, Bounding-box filter, etc.

 To the best of my knowledge, there are three possible points we can start
 from:

 1. The
 http://blog.jteam.nl/2009/08/03/geo-location-search-with-solr-and-lucene/
 2. The gissearch.com
 3. The
 http://www.ibm.com/developerworks/opensource/library/j-spatial/index.html#resources

 I saw that these three options have been used but didn't see any comparison
 between them. Is there any one out there who can recommend one option over
 another?

 Thanks,
 -S


Re: Cache full text into memory

2010-07-14 Thread findbestopensource
You have two options
1. Store the compressed text as part of stored field in Solr.
2. Using external caching.
http://www.findbestopensource.com/tagged/distributed-caching
You could use ehcache / Memcache / Membase.

The problem with external caching is you need to synchronize the deletions
and modification. Fetching the stored field from Solr is also faster.

Regards
Aditya
www.findbestopensource.com


On Wed, Jul 14, 2010 at 12:08 PM, Li Li fancye...@gmail.com wrote:

 I want to cache full text into memory to improve performance.
 Full text is only used to highlight in my application(But it's very
 time consuming, My avg query time is about 250ms, I guess it will cost
 about 50ms if I just get top 10 full text. Things get worse when get
 more full text because in disk, it scatters erverywhere for a query.).
 My full text per machine is about 200GB. The memory available for
 store full text is about 10GB. So I want to compress it in memory.
 Suppose compression ratio is 1:5, then I can load 1/4 full text in
 memory. I need a Cache component for it. Has anyone faced the problem
 before? I need some advice. Is it possbile using external tools such
 as MemCached? Thank you.



Re: Cache full text into memory

2010-07-14 Thread findbestopensource
I have just provided you two options. Since you already store as part of the
index, You could try external caching. Try using ehcache / Membase
http://www.findbestopensource.com/tagged/distributed-caching . The caching
system will do LRU and is much more efficient.

On Wed, Jul 14, 2010 at 12:39 PM, Li Li fancye...@gmail.com wrote:

 I have already store it in lucene index. But it is in disk and When a
 query come, it must seek the disk to get it. I am not familiar with
 lucene cache. I just want to fully use my memory that load 10GB of it
 in memory and a LRU stragety when cache full. To load more into
 memory, I want to compress it in memory. I don't care much about
 disk space so whether or not it's compressed in lucene .

 2010/7/14 findbestopensource findbestopensou...@gmail.com:
   You have two options
  1. Store the compressed text as part of stored field in Solr.
  2. Using external caching.
  http://www.findbestopensource.com/tagged/distributed-caching
 You could use ehcache / Memcache / Membase.
 
  The problem with external caching is you need to synchronize the
 deletions
  and modification. Fetching the stored field from Solr is also faster.
 
  Regards
  Aditya
  www.findbestopensource.com
 
 
  On Wed, Jul 14, 2010 at 12:08 PM, Li Li fancye...@gmail.com wrote:
 
  I want to cache full text into memory to improve performance.
  Full text is only used to highlight in my application(But it's very
  time consuming, My avg query time is about 250ms, I guess it will cost
  about 50ms if I just get top 10 full text. Things get worse when get
  more full text because in disk, it scatters erverywhere for a query.).
  My full text per machine is about 200GB. The memory available for
  store full text is about 10GB. So I want to compress it in memory.
  Suppose compression ratio is 1:5, then I can load 1/4 full text in
  memory. I need a Cache component for it. Has anyone faced the problem
  before? I need some advice. Is it possbile using external tools such
  as MemCached? Thank you.
 
 



Re: Cache full text into memory

2010-07-14 Thread findbestopensource
I doubt about it. Caching system is a key value store. You have to use some
compression library to compress and decompress your data. Caching system
helps to retrieve fast. Anyways please take a look of each of the caching
system features.

Regards
Aditya
www.findbestopensource.com



On Wed, Jul 14, 2010 at 3:06 PM, Li Li fancye...@gmail.com wrote:

 Thank you. I don't know which cache system to use. In my application,
 the cache system must support compression algorithm which has high
 compression ratio and fast decompression speed(because each time it
 get from cache, it must decompress).

 2010/7/14 findbestopensource findbestopensou...@gmail.com:
  I have just provided you two options. Since you already store as part of
 the
  index, You could try external caching. Try using ehcache / Membase
  http://www.findbestopensource.com/tagged/distributed-caching . The
 caching
  system will do LRU and is much more efficient.
 
  On Wed, Jul 14, 2010 at 12:39 PM, Li Li fancye...@gmail.com wrote:
 
  I have already store it in lucene index. But it is in disk and When a
  query come, it must seek the disk to get it. I am not familiar with
  lucene cache. I just want to fully use my memory that load 10GB of it
  in memory and a LRU stragety when cache full. To load more into
  memory, I want to compress it in memory. I don't care much about
  disk space so whether or not it's compressed in lucene .
 
  2010/7/14 findbestopensource findbestopensou...@gmail.com:
You have two options
   1. Store the compressed text as part of stored field in Solr.
   2. Using external caching.
   http://www.findbestopensource.com/tagged/distributed-caching
  You could use ehcache / Memcache / Membase.
  
   The problem with external caching is you need to synchronize the
  deletions
   and modification. Fetching the stored field from Solr is also faster.
  
   Regards
   Aditya
   www.findbestopensource.com
  
  
   On Wed, Jul 14, 2010 at 12:08 PM, Li Li fancye...@gmail.com wrote:
  
   I want to cache full text into memory to improve performance.
   Full text is only used to highlight in my application(But it's very
   time consuming, My avg query time is about 250ms, I guess it will
 cost
   about 50ms if I just get top 10 full text. Things get worse when get
   more full text because in disk, it scatters erverywhere for a
 query.).
   My full text per machine is about 200GB. The memory available for
   store full text is about 10GB. So I want to compress it in memory.
   Suppose compression ratio is 1:5, then I can load 1/4 full text in
   memory. I need a Cache component for it. Has anyone faced the problem
   before? I need some advice. Is it possbile using external tools such
   as MemCached? Thank you.
  
  
 
 



Re: Use of EmbeddedSolrServer

2010-06-11 Thread findbestopensource
Refer http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer

Regards
Aditya
www.findbestopensource.com


On Fri, Jun 11, 2010 at 2:25 PM, Robert Naczinski 
robert.naczin...@googlemail.com wrote:

 Hello experts,

 we would like to use Solr in our search application. We want to index
 a large inventory in the database. The initial index is not a problem,
 but it should also updates the Db. These tables, we plan to provide
 you with triggers. The only problem is the Datenbnk on zOS. But we get
 the updates on PL1. The Procedure to send a message through MQSeries
 that we would handy tips on the application. Then, the index should be
 updated.

 If the plan valid?

 If so, do I have in my application and use with EJB Message Driven
 Bean. The normal Solr server is but one was. Therefore, I would use
 EmbeddedSolrServer in a application deployed on WebSphere AppServer.
 Can I find somewhere a manual for the use of EmbeddedSolrServer?

 Regards,

 Robert



Re: Indexing link targets in HTML fragments

2010-06-07 Thread findbestopensource
Could you tell us your schema used for indexing. In my opinion, using
standardanalyzer / Snowball analyzer will do the best. They will not break
the URLs. Add href, and other related html tags as part of stop words and it
will removed while indexing.

Regards
Aditya
www.findbestopensource.com


On Mon, Jun 7, 2010 at 12:20 PM, Andrew Clegg andrew.cl...@gmail.comwrote:



 Lance Norskog-2 wrote:
 
  The PatternReplace and HTMPStrip tokenizers might be the right bet.
  The easiest way to go about this is to make a bunch of text fields
  with different analysis stacks and investigate them in the Scema
  Browser. You can paste an HTML document into the text box and see
  exactly how the words  markup get torn apart.
 

 Thanks Lance, I'll experiment.

 For reference, for anyone else who comes across this thread -- the html in
 my original post might have got munged on the way into or out of the list
 server. It was supposed to look like this:

 This is the entire content of my field, but [a
 href=http://example.com/]some of the words[/a] are a hyperlink.

 (but with real html tags instead of the square brackets)

 and I am just trying to extract the words and the link target but lose the
 rest of the markup.

 Cheers,

 Andrew.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Indexing-link-targets-in-HTML-fragments-tp874547p875503.html
  Sent from the Solr - User mailing list archive at Nabble.com.



Re: Query Question

2010-06-02 Thread findbestopensource
What analyzer you are using to index and search? Check out schema.xml. You
are currently using analyzer which breaks the words. If you don't want to
break then you need to use tokenizer
class=solr.KeywordTokenizerFactory/.

Regards
Aditya
www.findbestopensource.com



On Wed, Jun 2, 2010 at 2:41 PM, M.Rizwan muhammad.riz...@sigmatec.com.pkwrote:

 Hi,

 I have solr 1.4. In schema i have a field called title of type text
 Now problem is, when I search for Test_Title it brings all documents with
 titles like Test-Title, Test_Title, Test,Title, Test Title,
 Test.Title
 What to do to avoid this?

 Test_Title should only return documents having title Test_Title

 Any idea?

 Thanks

 - Riz



Re: logic for auto-index

2010-06-02 Thread findbestopensource
You need to do schedule your task. Check out schedulers available in all
programming languages.
http://www.findbestopensource.com/tagged/job-scheduler

Regards
Aditya
www.findbestopensource.com



On Wed, Jun 2, 2010 at 2:39 PM, Jonty Rhods jonty.rh...@gmail.com wrote:

 Hi Peter,

 actually I want the index process should start automatically. right now I
 am
 doing mannually.
 same thing I want to start indexing when less load on server i.e. late
 night. So setting auto will fix my
 problem..

  On Wed, Jun 2, 2010 at 2:00 PM, Peter Karich peat...@yahoo.de wrote:

  Hi Jonty,
 
  what is your specific problem?
  You could use a cronjob or the Java-lib called quartz to automate this
  task.
  Or did you mean replication?
 
  Regards,
  Peter.
 
   Hi All,
  
   I am very new to solr as well as java too.
   I require to use solrj for indexing also require to index automatically
  once
   in 24 hour.
   I wrote java code for indexing now I want to do further coding for
  automatic
   process.
   Could you suggest or give me sample code for automatic index process..
   please help..
  
   with regards
   Jonty.
  
 



Re: newbie question on how to batch commit documents

2010-06-01 Thread findbestopensource
Add commit after the loop. I would advise to use commit in a separate
thread. I do keep separate timer thread, where every minute I will do
commit and at the end of every day I will optimize the index.

Regards
Aditya
www.findbestopensource.com


On Tue, Jun 1, 2010 at 2:57 AM, Steve Kuo kuosen...@gmail.com wrote:

 I have a newbie question on what is the best way to batch add/commit a
 large
 collection of document data via solrj.  My first attempt  was to write a
 multi-threaded application that did following.

 CollectionSolrInputDocument docs = new ArrayListSolrInputDocument();
 for (Widget w : widges) {
doc.addField(id, w.getId());
doc.addField(name, w.getName());
   doc.addField(price, w.getPrice());
doc.addField(category, w.getCat());
doc.addField(srcType, w.getSrcType());
docs.add(doc);

// commit docs to solr server
server.add(docs);
server.commit();
 }

 And I got this exception.

 rg.apache.solr.common.SolrException:

 Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers2_try_again_later


 Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers2_try_again_later

at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424)
at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
at
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at
 org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:86)

 The solrj wiki/documents seemed to indicate that because multiple threads
 were calling SolrServer.commit() which in term called
 CommonsHttpSolrServer.request() resulting in multiple searchers.  My first
 thought was to change the configs for autowarming.  But after looking at
 the
 autowarm params, I am not sure what can be changed or perhaps a different
 approach is recommened.

filterCache
  class=solr.FastLRUCache
  size=512
  initialSize=512
  autowarmCount=0/

queryResultCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=0/

documentCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=0/

 Your help is much appreciated.



Re: Using solrJ to get all fields in a particular schema/index

2010-05-25 Thread findbestopensource
To reterive all documents, You need to use the query/filter *FieldName:*:**
Regards
Aditya
www.findbestopensource.com


On Tue, May 25, 2010 at 4:14 PM, Rakhi Khatwani rkhatw...@gmail.com wrote:

 Hi,
   Is there any way to get all the fields (irrespective of whether
 it contains a value or null) in solrDocument.
 or
 Is there any way to get all the fields in schema.xml of the url link (
 http://localhost:8983/solr/core0/)??

 Regards,
 Raakhi



Re: Using solrJ to get all fields in a particular schema/index

2010-05-25 Thread findbestopensource
To reterive all documents, You need to use the query/filter *FieldName:*:**
Regards
Aditya
www.findbestopensource.com
On Tue, May 25, 2010 at 4:14 PM, Rakhi Khatwani rkhatw...@gmail.com wrote:

 Hi,
   Is there any way to get all the fields (irrespective of whether
 it contains a value or null) in solrDocument.
 or
 Is there any way to get all the fields in schema.xml of the url link (
 http://localhost:8983/solr/core0/)??

 Regards,
 Raakhi



Re: Using solrJ to get all fields in a particular schema/index

2010-05-25 Thread findbestopensource
Resending it as there is a typo error.

To reterive all documents, You need to use the query/filter FieldName:*:* .


Regards
Aditya
www.findbestopensource.com


On Tue, May 25, 2010 at 4:29 PM, findbestopensource 
findbestopensou...@gmail.com wrote:

 To reterive all documents, You need to use the query/filter *FieldName:*:*
 *
 Regards
 Aditya
 www.findbestopensource.com


 On Tue, May 25, 2010 at 4:14 PM, Rakhi Khatwani rkhatw...@gmail.comwrote:

 Hi,
   Is there any way to get all the fields (irrespective of whether
 it contains a value or null) in solrDocument.
 or
 Is there any way to get all the fields in schema.xml of the url link (
 http://localhost:8983/solr/core0/)??

 Regards,
 Raakhi





Re: Using solrJ to get all fields in a particular schema/index

2010-05-25 Thread findbestopensource
If a field doesn't have a value, You will get NULL on retrieving it. How
could you expect a value for a field which is not provided?

You have two options, choose either one..
1. If the fieldvalue is returned NULL then display a proper error / user
defined message. Handle the error.
 2. Add a dummy value say NO_VALUE to the title field, which doesn't have
any value.

Regards
Aditya
www.findbestopensource.com




On Tue, May 25, 2010 at 5:20 PM, Rakhi Khatwani rkhatw...@gmail.com wrote:

 Hi Aditya,
   i can retrieve all documents. but cannot retrieve all the fields
 in a document(if it does not hv any value).

 For example i get a list of documents, some of the documents have some
 value
 for title field, and others mite not contain a value for title field. in
 anycase i need to get the entry for title in getFieldNames().

 How do i go about that?

 Regards,
 Raakhi


 On Tue, May 25, 2010 at 5:07 PM, findbestopensource 
  findbestopensou...@gmail.com wrote:

  Resending it as there is a typo error.
 
  To reterive all documents, You need to use the query/filter FieldName:*:*
 .
 
 
  Regards
  Aditya
  www.findbestopensource.com
 
 
  On Tue, May 25, 2010 at 4:29 PM, findbestopensource 
  findbestopensou...@gmail.com wrote:
 
   To reterive all documents, You need to use the query/filter
  *FieldName:*:*
   *
   Regards
   Aditya
   www.findbestopensource.com
  
  
   On Tue, May 25, 2010 at 4:14 PM, Rakhi Khatwani rkhatw...@gmail.com
  wrote:
  
   Hi,
 Is there any way to get all the fields (irrespective of
  whether
   it contains a value or null) in solrDocument.
   or
   Is there any way to get all the fields in schema.xml of the url link (
   http://localhost:8983/solr/core0/)??
  
   Regards,
   Raakhi
  
  
  
 



Re: Personalized Search

2010-05-20 Thread findbestopensource
Hi Rih,

You going to include either of the two field bought or like to per
member/visitor OR a unique field per member / visitor?

If it's one or two common fields are included then there will not be any
impact in performance. If you want to include unique field then you need to
consider multi value field otherwise you certainly hit the wall.

Regards
Aditya
www.findbestopensource.com




On Thu, May 20, 2010 at 12:13 PM, Rih tanrihae...@gmail.com wrote:

 Has anybody done personalized search with Solr? I'm thinking of including
 fields such as bought or like per member/visitor via dynamic fields to
 a
 product search schema. Another option is to have a multi-value field that
 can contain user IDs. What are the possible performance issues with this
 setup?

 Looking forward to your ideas.

 Rih



Re: Moving from Lucene to Solr?

2010-05-19 Thread findbestopensource
Hi Peter,

You need to use Lucene,

   - To have more control
   - You cannot depend on any Web server
   - To use termvector, termdocs etc
   - You could easily extend to have your own Analyzer

You need to use Solr,

   - To index and search docs easily by writting few code
   - Solr is a standalone App and it takes care most of the stuff like
   optimizing,warmup the reader etc..
   - Solr could be extended to multiple nodes
   - To use facet

If you are developing your client in Java and want to use Solr then i would
advise to use SolrJ as it is easy and you don't need to care about HTTP
stuff. I use Solr using SolrJ in my project www.findbestopensource.com

Regards
Aditya
www.findbestopensource.com



On Wed, May 19, 2010 at 4:08 PM, Peter Karich peat...@yahoo.de wrote:

 Hi all,

 while asking a question on stackoverflow [1] some other questions appear:
 Is SolrJ a recommended way to access Solr or should I prefer the HTTP
 interface?

 How can I (j)unit-test Solr? (e.g. create+delete index via Java call)

 Is Lucene faster than Solr? ... do you have experiences, preferable with
 the same index?

 The background is an application which uses Lucene at the moment but I
 hardly need the facetting feature of Solr and I don't want to implement
 it in Lucene for myself.

 Regards,
 Peter.

 [1]

 http://stackoverflow.com/questions/2856427/situations-to-prefer-apache-lucene-over-solr




Re: Solr Deployment Question

2010-05-14 Thread findbestopensource
Please explain how you have handled two indexes in a single VM. Is it multi
core?

To identify memory consumption, You need to calculate usedmemory before and
after loading the indexes, basically calculate usedmemory before and after
any check point you want to analyse. Their difference will give you the
actual memory consumption.

Regards
Aditya
http://www.findbestopensource.com


On Fri, May 14, 2010 at 11:14 AM, Maduranga Kannangara 
mkannang...@infomedia.com.au wrote:

 But even we used a single index, we were running out of memory.
 What do you mean by active? No queries on the masters.
 Only one index is being processed/optimized.

 Also, if I may add to my same question, how can I find the
 amount of memory that an index would use, theoretically?
 i.e.: Is there a formulae etc?

 Thanks
 Madu



 -Original Message-
 From: findbestopensource [mailto:findbestopensou...@gmail.com]
 Sent: Friday, 14 May 2010 3:34 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Deployment Question

 You may use one index at a time, but both indexes are active and loaded all
 its terms in memory. Memory consumption will be certainly more.

 Regards
 Aditya
 http://www.findbestopensource.com

 On Fri, May 14, 2010 at 10:28 AM, Maduranga Kannangara 
 mkannang...@infomedia.com.au wrote:

  Hi
 
  We use separate JVMs to Index and Query.
  (Client applications will query only slaves,
  while master does only indexing)
 
  Recently we moved a two master indexes to
  a single JVM. Our memory allocation was for
  each index was 512Mb and 1Gb.
 
  Once we moved both indexes to a single VM,
  we thought it would still Index using 1Gb as we
  use only one index at a time. But for our surprise
  it needed more than that (1.2Gb) even though
  only one index was used at a time.
 
  Can I know why, or can I know how to find
  why this is?
 
  Solr 1.4
  Java 1.6.0_20
 
  We use a VPS for deployment.
 
  Thanks in advance
  Madu
 
 
 



Re: Solr Deployment Question

2010-05-13 Thread findbestopensource
You may use one index at a time, but both indexes are active and loaded all
its terms in memory. Memory consumption will be certainly more.

Regards
Aditya
http://www.findbestopensource.com

On Fri, May 14, 2010 at 10:28 AM, Maduranga Kannangara 
mkannang...@infomedia.com.au wrote:

 Hi

 We use separate JVMs to Index and Query.
 (Client applications will query only slaves,
 while master does only indexing)

 Recently we moved a two master indexes to
 a single JVM. Our memory allocation was for
 each index was 512Mb and 1Gb.

 Once we moved both indexes to a single VM,
 we thought it would still Index using 1Gb as we
 use only one index at a time. But for our surprise
 it needed more than that (1.2Gb) even though
 only one index was used at a time.

 Can I know why, or can I know how to find
 why this is?

 Solr 1.4
 Java 1.6.0_20

 We use a VPS for deployment.

 Thanks in advance
 Madu





Re: multi-valued associated fields

2010-05-12 Thread findbestopensource
Hello Eric,

Certainly it is possible. I would strongly advice to have field which
differentiates the record type (RECORD_TYPE:CAR / PROPERTY).

In general I was also wondering how Solr developers implement websites
that
uses tag filters.For example, a user clicks on Hard drives then get tags
External,
Internal then clicks on External and gets usb, firewire etc.
By using faceting queries, You could acheive this.

Regards
Aditya
www.findbestopensource.com




On Wed, May 12, 2010 at 12:29 PM, Eric Grobler impalah...@googlemail.comwrote:

 Hallo Solr community,

 We are considering Solr for searching on content from various partners
 with wildly different content.

 Is it possible or practical to work with multi-valued associated fields
 like
 this?
 Make:Audi, Model:A4, Color:Blue, Year:1998, KM:20, Extras:GPS
 Type:Flat, Rooms:2, Period:6 months
 Make:Toshiba, Model:Tecra, RAM:4GB, Extras:BlueRay;Lock
 Breed:Siamese, Age:9 weeks

 and do:
 - searching on individual keys
 - range queries within multi-valued fields.
 - faceting

 I suppose an alternative would be to create unnamed fields like
 range1, range2, range3 with a descripter field like
  Year,KM,EngineSize for a car document and
  Rooms for a property document for example.

 In general I was also wondering how Solr developers implement websites that
 uses tag filters.
 For example, a user clicks on Hard drives then get tags External,
 Internal then clicks on External and gets usb, firewire etc.

 Any suggestions and feedback would be greatly appreciated.

 Regards
 Eric



Re: Solr 1.4 Enterprise Search Server book examples

2010-04-27 Thread findbestopensource
I downloaded the 5883_Code.zip file but not able to extract the complete
contents.

Regards
Aditya
www.findbestopensource.com



On Tue, Apr 27, 2010 at 12:45 AM, Johan Cwiklinski 
johan.cwiklin...@ajlsm.com wrote:

 Hello,

 Le 26/04/2010 20:53, findbestopensource a écrit :
  I am able to successfully download the code. It is of 360 MB and took lot
 of
  time to download.

 I'm also able to download the file ; but not to extract many of the
 files it contains after download (can list them but not extract, an
 error occurs).

 Are you able to extract the ZIP archive you've downloaded?


  https://www.packtpub.com/solr-1-4-enterprise-search-server/book
  Select the download the code link and provide your email id, Download
 link
  will be sent via email.
 
  Regards
  Aditya
  www.findbestopensource.com
 
 
 
  On Mon, Apr 26, 2010 at 8:34 PM, Abdelhamid ABID aeh.a...@gmail.com
 wrote:
 
  Hi,
  I'm also interested to get those examples, would someone to share them ?
 
  On 4/26/10, markus.rietz...@rzf.fin-nrw.de 
 markus.rietz...@rzf.fin-nrw.de
 
  wrote:
   
  i have send you a private mail.
 
  markus
 
  -Ursprüngliche Nachricht-
  Von: Johan Cwiklinski [mailto:johan.cwiklin...@ajlsm.com]
  Gesendet: Montag, 26. April 2010 10:58
  An: solr-user@lucene.apache.org
  Betreff: Solr 1.4 Enterprise Search Server book examples
 
  Hello,
 
  We've recently acquired the Solr 1.4 Enterprise Search Server book.
 
  I've tried to download the example ZIP file from the editor's website,
  but the file is actually corrupted, and I cannot unzip it :(
 
  Could someone tell me if I can get these examples from
  another location?
 
  I've send a message last week to the editor reporting the issue, but
  that is not yet fixed ; and I'd really like to take a look at the
  example code and make some tests.
 
  Regards,
  --
  Johan Cwiklinski
 
 
 
 
 
  --
  Abdelhamid ABID
  Software Engineer- J2EE / WEB
 
 

 --
 Johan Cwiklinski



Re: hybrid approach to using cloud servers for Solr/Lucene

2010-04-25 Thread findbestopensource
Hello Dennis

If the load goes up, then queries are sent to the cloud at a certain
point.
My advice is to do load balancing between local and cloud.  Your local
system seems to be capable as it is a dedicated host. Another option is to
do indexing in local and sync it with cloud. Cloud will be only used for
search.

Hope it helps.

Regards
Aditya
www,findbestopensource.com


On Mon, Apr 26, 2010 at 7:47 AM, Dennis Gearon gear...@sbcglobal.netwrote:

 I'm working on an app that could grow much faster and bigger than I could
 scale local resources, at least on certain dates and for other reasons.

 So I'd like to run a local machine in a dedicated host or even virtual
 machine at a host.

 If the load goes up, then queries are sent to the cloud at a certain point.

 Is this practical, anyone have experience in this?

 This is obviously a search engine app based on solr/lucene if someone is
 wondering.

 Dennis Gearon

 Signature Warning
 
 EARTH has a Right To Life,
  otherwise we all die.

 Read 'Hot, Flat, and Crowded'
 Laugh at http://www.yert.com/film.php



Re: Best Open Source

2010-04-22 Thread findbestopensource
Thank you Dave and Michael for your feedback.

We are currently in beta and we will fix these issues sooner.

Regards
Aditya
www.findbestopensource.com



On Tue, Apr 20, 2010 at 3:01 PM, Michael Kuhlmann 
michael.kuhlm...@zalando.de wrote:

 Nice site. Really!

 In addition to Dave:
 How do I search with tags enabled?
 If I search for Blog, I can see that there's one blog software written
 in Java. When I click on the Java tag, then my search is discarded, and
 I get all Java software. when I do my search again, the tag filter is
 lost. It seems to be impossible to combine tag filters with search.

 -Michael

 Am 20.04.2010 11:00, schrieb solai ganesh:
   Hello all,
 
  We have launched a new site hosting the best open source products and
  libraries across all categories. This site is powered by Solr search.
 There
  are many open source products available in all categories and it is
  sometimes difficult to identify which is the best. We identify the best.
 As
  a open source users, you might be using many opensource products and
  libraries , It would be great, if you help us to identify the best.
 
  http://www.findbestopensource.com/
 
  Regards
  Aditya