Re: Solr, How to index scripts *.sh and *.SQL

2014-05-15 Thread Alexei Martchenko
Same in Windows. just plain text files, no metadata, no headers.


alexei martchenko
Facebook http://www.facebook.com/alexeiramone |
Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
Steam http://steamcommunity.com/id/alexeiramone/ |
4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
Github https://github.com/alexeiramone | (11) 9 7613.0966 |


2014-05-11 4:32 GMT-03:00 Gora Mohanty g...@mimirtech.com:

 On 8 May 2014 12:25, Visser, Marc marc.viss...@ordina.nl wrote:
 
  HI All,
  Recently I have set up an image with SOLR. My goal is to index and
 extract files on a Windows and Linux server. It is possible for me to index
 and extract data from multiple file types. This is done by the SOLR CELL
 request handler. See the post.jar cmd below.
 
  j ava -Dauto -Drecursive -jar post.jar Y:\ SimplePostTool version 1.5
 Posting files to base url localhost:8983/solr/update.. Entering auto mode.
 File endings considered are xml,json,csv,pdf,doc,docx,ppt,pp
 tx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log Entering recursive
 mode, max depth=999, delay=0s 0 files indexed.
 
  Is it possible to index and extract metadata/content from file types
 like .sh and .sql? If it is possible I would like to know how of course :)

 Don't know about Windows, but on Linux these are just text files. What
 metadata are you referring to? Normally, a Linux text file only has
 content,
 unless you are talking about metadata such as obtained from:
file cmd.sh

 Regards,
 Gora



Re: Required fields

2014-03-21 Thread Alexei Martchenko
false


alexei martchenko
Facebook http://www.facebook.com/alexeiramone |
Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
Steam http://steamcommunity.com/id/alexeiramone/ |
4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
Github https://github.com/alexeiramone | (11) 9 7613.0966 |


2014-03-21 17:17 GMT-03:00 Walter Underwood wun...@wunderwood.org:

 What is the default value for the required attribute of a field element in
 a schema? I've just looked everywhere I can think of in the wiki, the
 reference manual, and the JavaDoc. Most of the documentation doesn't even
 mention that attribute.

 Once we answer this, it should be added to the documented attributes for
 field.

 wunder
 --
 Walter Underwood
 wun...@wunderwood.org






Re: Indexing large documents

2014-03-19 Thread Alexei Martchenko
Even the most non-structured data has to have some breakpoint. I've seen
projects running solr that used to index whole books one document per
chapter plus a synopsis boosted doc. The question here is how you need to
search and match those docs.


alexei martchenko
Facebook http://www.facebook.com/alexeiramone |
Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
Steam http://steamcommunity.com/id/alexeiramone/ |
4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
Github https://github.com/alexeiramone | (11) 9 7613.0966 |


2014-03-18 23:52 GMT-03:00 Stephen Kottmann 
stephen_kottm...@h3biomedicine.com:

 Hi Solr Users,

 I'm looking for advice on best practices when indexing large documents
 (100's of MB or even 1 to 2 GB text files). I've been hunting around on
 google and the mailing list, and have found some suggestions of splitting
 the logical document up into multiple solr documents. However, I haven't
 been able to find anything that seems like conclusive advice.

 Some background...

 We've been using solr with great success for some time on a project that is
 mostly indexing very structured data - ie. mainly based on ingesting
 through DIH.

 I've now started a new project and we're trying to make use of solr again -
 however, in this project we are indexing mostly unstructured data - pdfs,
 powerpoint, word, etc. I've not done much configuration - my solr instance
 is very close to the example provided in the distribution aside from some
 minor schema changes. Our index is relatively small at this point ( ~3k
 documents ), and for initial indexing I am pulling documents from a http
 data source, running them through Tika, and then pushing to solr using
 solrj. For the most part this is working great... until I hit one of these
 huge text files and then OOM on indexing.

 I've got a modest JVM - 4GB allocated. Obviously I can throw more memory at
 it, but it seems like maybe there's a more robust solution that would scale
 better.

 Is splitting the logical document into multiple solr documents best
 practice here? If so, what are the considerations or pitfalls of doing this
 that I should be paying attention to. I guess when querying I always need
 to use a group by field to prevent multiple hits for the same document. Are
 there issues with term frequency, etc that you need to work around?

 Really interested to hear how others are dealing with this.

 Thanks everyone!
 Stephen

 --
 [This e-mail message may contain privileged, confidential and/or
 proprietary information of H3 Biomedicine. If you believe that it has been
 sent to you in error, please contact the sender immediately and delete the
 message including any attachments, without copying, using, or distributing
 any of the information contained therein. This e-mail message should not be
 interpreted to include a digital or electronic signature that can be used
 to authenticate an agreement, contract or other legal document, nor to
 reflect an intention to be bound to any legally-binding agreement or
 contract.]



Re: [ANNOUNCE] Heliosearch 0.04

2014-03-14 Thread Alexei Martchenko
Chrome on Windows reports the latest Heliosearch as probable malware and
asks for a keep or discard. Norton says everything's ok with that file.
Are you guys aware of this?


alexei martchenko
Facebook http://www.facebook.com/alexeiramone |
Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
Steam http://steamcommunity.com/id/alexeiramone/ |
4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
Github https://github.com/alexeiramone | (11) 9 7613.0966 |


2014-03-14 12:58 GMT-03:00 Yago Riveiro yago.rive...@gmail.com:

 It's possible switch between solr 4.6.1 and Heliosearch in a transparent
 way?


 On Fri, Mar 14, 2014 at 3:56 PM, Mike Murphy mmurphy3...@gmail.com
 wrote:

  This is fantastic!
  I tried swapping in heliosearch for a customer that was having big
  garbage collection issues, and all of the big gc pauses simply
  disappeared!
 
  Now the problem - heliosearch only has a pre-release out based on solr
  trunk.  Are there near term plans for a more stable release that would
  be advisable for production use?
 
  --Mike
 
  On Mon, Mar 10, 2014 at 1:04 PM, Yonik Seeley yo...@heliosearch.com
  wrote:
   Changes from the previous release are primarily off-heap FieldCache
   support for strings as well as as all numerics (the previous release
   only had integer support).
  
   Benchmarks for string fields here:
   http://heliosearch.org/hs-solr-off-heap-fieldcache-performance
  
   Try it out here: https://github.com/Heliosearch/heliosearch/releases/
  
   -Yonik
   http://heliosearch.org - native off-heap filters and fieldcache for
 solr
 



 --
 /Yago Riveiro



Re: ExtendedDismax and NOT operator

2014-02-07 Thread Alexei Martchenko
Just to clarify: the actual url is properly space-escaped?

http://localhost:8983/solr/distrib/select?q=term1%20NOT%20
term2start=0rows=0qt=edismax_basicdebugQuery=true



alexei martchenko
Facebook http://www.facebook.com/alexeiramone |
Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
Steam http://steamcommunity.com/id/alexeiramone/ |
4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
Github https://github.com/alexeiramone | (11) 9 7613.0966 |


2014-02-07 12:40 GMT-02:00 Geert Van Huychem ge...@iframeworx.be:

 Hi

 This is my config:

   requestHandler name=edismax_basic class=solr.SearchHandler
 lst name=defaults
   str name=defTypeedismax/str
   str name=qfbody/str
   str name=pftitle^30 introduction^15 body^10/str
   str name=ps0/str
 /lst
   /requestHandler

 Executing the following link:
 http://localhost:8983/solr/distrib/select?q=term1 NOT
 term2start=0rows=0qt=edismax_basicdebugQuery=true

 gives me as debuginfo:

 str name=parsedquery
 (+(DisjunctionMaxQuery((body:term1)) -DisjunctionMaxQuery((body:term2)))
 DisjunctionMaxQuery((title:term1 term2^30.0))
 DisjunctionMaxQuery((introduction:term1 term2^15.0))
 DisjunctionMaxQuery((body:term1 term2^10.0)))/no_coord
 /str

 My question is: why is term2 included in the phrase query part?

 Best
 Geert Van Huychem



Re: Import data from mysql to sold

2014-02-04 Thread Alexei Martchenko
1) Yes, its the JDBC connection URL/URI. You can use a JNDI preconfigured
datasource instead. It's all here
http://wiki.apache.org/solr/DataImportHandler

2) It's a mapping: column is the database column and name is your solr
destination field. You only need to specify name when both differ.

DIH looks like a 7-headed dragon first time you see it, but by the end of
the day you'll love it.



alexei martchenko
Facebook http://www.facebook.com/alexeiramone |
Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
Steam http://steamcommunity.com/id/alexeiramone/ |
4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
Github https://github.com/alexeiramone | (11) 9 7613.0966 |


2014-02-04 rachun rachun.c...@gmail.com:

 please see below code..

 dataConfig
   dataSource type=JdbcDataSource
   driver=com.mysql.jdbc.Driver
   url=jdbc:mysql://localhost:3306/mydb01
   user=root
   password=/
   document
 entity name=users query=select id,firstname,username from users
 field column=id name=user_id /
 field column=firstname name=user_firstname /
 /entity
   /document
 /dataConfig

 my question is..
 1. what is the url for?  (url=jdbc:mysql://localhost:3306/mydb01 )
 does it means my database url?

 2. did i do it right with this
 field column=id name=user_id /
 i'm not sure name means the field in Solr?


 Thank you very much,
 Chun.






 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Import-data-from-mysql-to-sold-tp4114982p4115191.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Import data from mysql to sold

2014-02-03 Thread Alexei Martchenko
I've been using DIH to import large Databases to XML file batches and It's
blazing fast.


alexei martchenko
Facebook http://www.facebook.com/alexeiramone |
Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
Steam http://steamcommunity.com/id/alexeiramone/ |
4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
Github https://github.com/alexeiramone | (11) 9 7613.0966 |


2014-02-03 rachun rachun.c...@gmail.com:

 Dear all gurus,

 I would like to import my data (mysql) about 4 Million rows  into solar
 4.6.
 What is the best way to do it?

 Please suggest me.

 Million thanks,
 Chun.




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Import-data-from-mysql-to-sold-tp4114982.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Apache Solr.

2014-02-03 Thread Alexei Martchenko
That's right, Solr doesn't import PDFs as it imports XMLs. You'll need to
use Tikka to import binary/specific file types.

http://tika.apache.org/1.4/formats.html


alexei martchenko
Facebook http://www.facebook.com/alexeiramone |
Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
Steam http://steamcommunity.com/id/alexeiramone/ |
4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
Github https://github.com/alexeiramone | (11) 9 7613.0966 |


2014-02-03 Siegfried Goeschl sgoes...@gmx.at:

 Hi Vignesh,

 a few keywords for further investigations

 * Solr Data Import Handler
 * Apache Tikka
 * Apache PDFBox

 Cheers,

 Siegfried Goeschl


 On 03.02.14 09:15, vignesh wrote:

 Hi Team,



  I am Vignesh, am using Apache Solr 3.6 and able to
 Index
 XML file and now trying to Index PDF file and not able to index .Can you
 give me the steps to carry out PDF indexing it will be very useful. Kindly
 guide me through this process.





 Thanks  Regards.

 Vignesh.V



 cid:image001.jpg@01CA4872.39B33D40

 Ninestars Information Technologies Limited.,

 72, Greams Road, Thousand Lights, Chennai - 600 006. India.

 Landline : +91 44 2829 4226 / 36 / 56   X: 144

   blocked::http://www.ninestars.in/ www.ninestars.in




 --

 30 Million Advertisements displayed. Is yours there?
 http://www.safentrixads.com/adlink?cid=13
 --





Re: Disabling Commit/Auto-Commit (SolrCloud)

2014-01-31 Thread Alexei Martchenko
Why don't you set both solrconfig commits to very high values and issue a
commit command in sparsed, small updates?

I've been doing this for ages and works perfecly for me.


alexei martchenko
Facebook http://www.facebook.com/alexeiramone |
Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
Steam http://steamcommunity.com/id/alexeiramone/ |
4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
Github https://github.com/alexeiramone | (11) 9 7613.0966 |


2014-01-31 Software Dev static.void@gmail.com:

 Is there a way to disable commit/hard-commit at runtime? For example, we
 usually have our hard commit and soft-commit set really low but when we do
 bulk indexing we would like to disable this to increase performance. If
 there isn't a an easy way of doing this would simply pushing a new
 solrconfig to solrcloud work?



Re: Disabling Commit/Auto-Commit (SolrCloud)

2014-01-31 Thread Alexei Martchenko
I didn't mean to disable, just to put some high value there. I have a
script that updates my solr in batches of thousands so I set my commit to
100,000 because when it runs it updates 100,000 records in short time.

The other script updates in batches of hundreds and its not so fast, so its
internal loops issue a commit after X loops and/or when it finishes
processing.


alexei martchenko
Facebook http://www.facebook.com/alexeiramone |
Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
Steam http://steamcommunity.com/id/alexeiramone/ |
4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
Github https://github.com/alexeiramone | (11) 9 7613.0966 |


2014-01-31 Mark Miller markrmil...@gmail.com:

 It's not a good idea to disable hard commit because the transaction can
 grow without limit in RAM.

 Also, try some performance tests. I've never seen it matter if it's set to
 like a minute, both for bulk and NRT.

 As far as soft commit, you could turn it off and control visibility when
 adding docs via commitWithin.

 - Mark

 http://about.me/markrmiller

 On Jan 31, 2014, at 12:45 PM, Software Dev static.void@gmail.com
 wrote:

  Is there a way to disable commit/hard-commit at runtime? For example, we
  usually have our hard commit and soft-commit set really low but when we
 do
  bulk indexing we would like to disable this to increase performance. If
  there isn't a an easy way of doing this would simply pushing a new
  solrconfig to solrcloud work?




Re: Regarding Solr Faceting on the query response.

2014-01-30 Thread Alexei Martchenko
I believe its not possible to facet only the page you are, facet is
supposed to work only with the full resultset. I never tried but i've never
seen a way this could be done.


alexei martchenko
Facebook http://www.facebook.com/alexeiramone |
Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
Steam http://steamcommunity.com/id/alexeiramone/ |
4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
Github https://github.com/alexeiramone | (11) 9 7613.0966 |


2014-01-30 Mikhail Khludnev mkhlud...@griddynamics.com:

 Hello
 Do you mean setting
 http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount to 1 or
 you want to facet only returned page (rows) instead of full resultset
 (numFound) ?


 On Thu, Jan 30, 2014 at 6:24 AM, Nilesh Kuchekar
 kuchekar.nil...@gmail.comwrote:

  Yeah it's a typo... I meant company:Apple
 
  Thanks
  Nilesh
 
   On Jan 29, 2014, at 8:59 PM, Alexandre Rafalovitch arafa...@gmail.com
 
  wrote:
  
   On Thu, Jan 30, 2014 at 3:43 AM, Kuchekar kuchekar.nil...@gmail.com
  wrote:
   company=Apple
   Did you mean company:Apple ?
  
   Otherwise, that could be the issue.
  
   Regards,
 Alex.
  
  
   Personal website: http://www.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all
   at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
   book)
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com



Re: Solr Nutch

2014-01-28 Thread Alexei Martchenko
1) Plus, those files are binaries sometimes with metadata, specific
crawlers need to understand them. html is a plain text

2) Yes, different data schemes. Sometimes I replicate the same core and
make some A-B tests with different weights, filters etc etc and some people
like to creare CoreA and CoreB with the same schema and hammer CoreA with
updates and commits and optmizes, they make it available for searches while
hammering CoreB. Then swap again. This produces faster searches.


alexei martchenko
Facebook http://www.facebook.com/alexeiramone |
Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
Steam http://steamcommunity.com/id/alexeiramone/ |
4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
Github https://github.com/alexeiramone | (11) 9 7613.0966 |


2014-01-28 Jack Krupansky j...@basetechnology.com

 1. Nutch follows the links within HTML web pages to crawl the full graph
 of a web of pages.

 2. Think of a core as an SQL table - each table/core has a different type
 of data.

 3. SolrCloud is all about scaling and availability - multiple shards for
 larger collections and multiple replicas for both scaling of query response
 and availability if nodes go down.

 -- Jack Krupansky

 -Original Message- From: rashmi maheshwari
 Sent: Tuesday, January 28, 2014 11:36 AM
 To: solr-user@lucene.apache.org
 Subject: Solr  Nutch


 Hi,

 Question1 -- When Solr could parse html, documents like doc, excel pdf
 etc, why do we need nutch to parse html files? what is different?

 Questions 2: When do we use multiple core in solar? any practical business
 case when we need multiple cores?

 Question 3: When do we go for cloud? What is meaning of implementing solr
 cloud?


 --
 Rashmi
 Be the change that you want to see in this world!
 www.minnal.zor.org
 disha.resolve.at
 www.artofliving.org



Re: Synonyms and spellings

2014-01-28 Thread Alexei Martchenko
2) There are some synonym lists on the web, they aren't always complete but
I keep analyzing fields and tokens in order to polish my synonyms. And I
like to use tools like http://www.visualthesaurus.com/ to aid me.

Hope this helps :-)


alexei martchenko
Facebook http://www.facebook.com/alexeiramone |
Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
Steam http://steamcommunity.com/id/alexeiramone/ |
4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
Github https://github.com/alexeiramone | (11) 9 7613.0966 |


2014-01-28 rashmi maheshwari maheshwari.ras...@gmail.com

 Hi,

 Questions 1)  Why do we use Spellings file under solr core conf folder?
 What spellings do we enter in this?

 Question 2) : Implementing all synonyms is a tough thing. From where could
 i get list of as many synonyms as we could see in google search?




 --
 Rashmi
 Be the change that you want to see in this world!
 www.minnal.zor.org
 disha.resolve.at
 www.artofliving.org



Re: Solr Nutch

2014-01-28 Thread Alexei Martchenko
Well, not even Google parse those. I'm not sure about Nutch but in some
crawlers (jSoup i believe) there's an option to try to get full URLs from
plain text, so you can capture some urls in the form of someClickFunction('
http://www.someurl.com/whatever') or even if they are in the middle of some
paragraph. Sometimes it works beautifully, sometimes it misleads you to
parse urls shortened with ellipsis in the middle.



alexei martchenko
Facebook http://www.facebook.com/alexeiramone |
Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
Steam http://steamcommunity.com/id/alexeiramone/ |
4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
Github https://github.com/alexeiramone | (11) 9 7613.0966 |


2014-01-28 rashmi maheshwari maheshwari.ras...@gmail.com

 Thanks All for quick response.

 Today I crawled a webpage using nutch. This page have many links. But all
 anchor tags have href=# and javascript is written on onClick event of
 each anchor tag to open a new page.

 So crawler didnt crawl any of those links which were opening using onClick
 event and has # href value.

 How these links are crawled using nutch?




 On Tue, Jan 28, 2014 at 10:54 PM, Alexei Martchenko 
 ale...@martchenko.com.br wrote:

  1) Plus, those files are binaries sometimes with metadata, specific
  crawlers need to understand them. html is a plain text
 
  2) Yes, different data schemes. Sometimes I replicate the same core and
  make some A-B tests with different weights, filters etc etc and some
 people
  like to creare CoreA and CoreB with the same schema and hammer CoreA with
  updates and commits and optmizes, they make it available for searches
 while
  hammering CoreB. Then swap again. This produces faster searches.
 
 
  alexei martchenko
  Facebook http://www.facebook.com/alexeiramone |
  Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
  Steam http://steamcommunity.com/id/alexeiramone/ |
  4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
  Github https://github.com/alexeiramone | (11) 9 7613.0966 |
 
 
  2014-01-28 Jack Krupansky j...@basetechnology.com
 
   1. Nutch follows the links within HTML web pages to crawl the full
 graph
   of a web of pages.
  
   2. Think of a core as an SQL table - each table/core has a different
 type
   of data.
  
   3. SolrCloud is all about scaling and availability - multiple shards
 for
   larger collections and multiple replicas for both scaling of query
  response
   and availability if nodes go down.
  
   -- Jack Krupansky
  
   -Original Message- From: rashmi maheshwari
   Sent: Tuesday, January 28, 2014 11:36 AM
   To: solr-user@lucene.apache.org
   Subject: Solr  Nutch
  
  
   Hi,
  
   Question1 -- When Solr could parse html, documents like doc, excel pdf
   etc, why do we need nutch to parse html files? what is different?
  
   Questions 2: When do we use multiple core in solar? any practical
  business
   case when we need multiple cores?
  
   Question 3: When do we go for cloud? What is meaning of implementing
 solr
   cloud?
  
  
   --
   Rashmi
   Be the change that you want to see in this world!
   www.minnal.zor.org
   disha.resolve.at
   www.artofliving.org
  
 



 --
 Rashmi
 Be the change that you want to see in this world!
 www.minnal.zor.org
 disha.resolve.at
 www.artofliving.org



Re: boost a document which has a field not empty

2011-09-21 Thread Alexei Martchenko
Can u assign a doc boost at index time?

2011/9/21 Zoltan Altfatter altfatt...@gmail.com

 Hi,

 I have one entity called organisation. I am indexing their name to be able
 to search afterwards on their name.
 I store also the website of the organisation. Some organisations have a
 website some don't.
 Can I achieve that when searching for organisations even if I have a match
 on their name I will show first those which have a website.

 Thank you.

 Regards,
 Zoltan




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Schema fieldType y-m-d ?!?!

2011-09-14 Thread Alexei Martchenko
If you don't need date-specific functions and/or faceting, you can store it
as a int, like 20110914 and parse it in your application

but I don't recommend... as a rule of thumb, dates should be stored as
dates, the millenium bug (Y2K bug) was all about 'saving some space'
remember?


Re: how can we do the solr scheduling in windows o/s?

2011-09-02 Thread Alexei Martchenko
Under administrative Tools, select Task Scheduler.

New task, action: Run program/script, then you can call a java command line
like java -jar something.jar

the sheduler itself is pretty good, but the tasks it can perform are too
few.. but it can run java programs via command line.

2011/9/2 vighnesh svighnesh...@gmail.com

 hi all

  anyone can specify the procedure for solr scheduling in windoes o/s?

  http://wiki.apache.org/solr/DataImportHandler#HTTPPostScheduler i know
 this
 link but i need cron job like procedure in windows .






 Regards,

 Ganesh.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/how-can-we-do-the-solr-scheduling-in-windows-o-s-tp3303679p3303679.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Image results in Solr Search

2011-09-02 Thread Alexei Martchenko
Never done that before but as far as I know, Tika does that job.

http://tika.apache.org/0.9/formats.html#Image_formats

2011/9/2 Jagdish Kumar jagdish.thapar...@hotmail.com


 Hi

 I am trying indexing and searching various type of files in Solr3.3.0, I am
 able to index image files but it fail to show these files in result of any
 search operation.

 I am not aware of how Solr works for searching images, I mean it is content
 based or meta data based .. I am not sure.

 If any of you have done Image Searches with Solr , I request you to please
 help me out with this.

 Thanks
 Jagdish




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Solr and Encoding Issue?

2011-09-02 Thread Alexei Martchenko
What does the Analysis say? Put all words in both field value index and
query and compare them plz.

Have you tried to encode it manually in the url just in case?

2011/9/2 deniz denizdurmu...@gmail.com

 I am trying to implement multi accented search on solr... basically i am
 using asciifolderfilter to provide this feature... but i have a problem...


 http://localhost:8983/solr/select/?q=*francois*version=2.2start=0rows=10indent=on

 http://localhost:8983/solr/select/?q=*francois**version=2.2start=0rows=10indent=on

 http://localhost:8983/solr/select/?q=*françois*version=2.2start=0rows=10indent=on

 these three above working well and returning correct results, however


 http://localhost:8983/solr/select/?q=*françois**version=2.2start=0rows=10indent=on

 the link above returns 0 matching documents...

 anybody has any ideas on this?  could it be because of encoding issue?

 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-and-Encoding-Issue-tp3303627p3303627.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Question on functions

2011-09-01 Thread Alexei Martchenko
We put here

requestHandler name=whatever class=solr.StandardRequestHandler
default=true
lst name=defaults
str name=defTypedismax/str
str name=qf.../str
str name=pf.../str
str name=bfrecip(ms(NOW,sear_dataupdate),3.16e-11,1,1)/str
...


2011/9/1 Craig Stadler cstadle...@hotmail.com

 Regarding :
 http://wiki.apache.org/solr/**FunctionQuery#Date_Boostinghttp://wiki.apache.org/solr/FunctionQuery#Date_Boosting

 Specifcally : recip(ms(NOW/HOUR,mydatefield)**,3.16e-11,1,1).

 I am using dismax, and I am very unsure on where to put this or call the
 function... for example in the fq= param??, in the q= param?

 Sample query :
 http://localhost:8983/solr/**dismax/?q=http://localhost:8983/solr/dismax/?q=george
 clooneymm=48%25debugQuery=**offindent=onstart=rows=10
 If I want to factor in score/date (called creationdate)...

 recip(ms(NOW/HOUR,**creationdate),3.16e-11,1,1).

 Help! and thanks so much for any examples or help..
 -Craig





-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Solr 3.3 dismax MM parameter not working properly

2011-08-31 Thread Alexei Martchenko
I'm printing a big bold cheatsheet about it and stickin' it everywhere :-)

I wish I could change this thread's subject to alexei is not working
properly :-/

2011/8/30 Erick Erickson erickerick...@gmail.com

 Yep, that one takes a while to figure out, then
 I wind up re-figuring it out every time I have
 to change it G...

 Best
 Erick

 On Tue, Aug 30, 2011 at 6:36 PM, Alexei Martchenko
 ale...@superdownloads.com.br wrote:
  Hmmm I believe I discovered the problem.
 
  When you have something like this:
 
  250% 6-60%
 
  you should read it from right to left and use the word MORE.
 
  MORE THAN SIX clauses, 60% are optional, MORE THAN TWO clauses (and that
  includes 3, 4 and 5 AND 6) half is mandatory.
 
  if you wanna a special rule for 2 terms just add:
 
  11 250% 6-60%
 
  MORE THAN ONE clauses (2) should match 1.
 
  NOW this makes sense!
 
  2011/8/30 Alexei Martchenko ale...@superdownloads.com.br
 
  Anyone else strugglin' with dismax's MM parameter?
 
  We're having a problem here, seems that configs from 3 terms and more
 are
  being ignored by solr and it assumes previous configs.
 
  if I use str name=mm3lt;1/str or str name=mm3lt;100%/str i
  get the same results for a 3-term query.
  If i try str name=mm4lt;25%/str or str name=mm4lt;100%/str
 I
  also get same data for a 4-term query.
 
  I'm searching: windows service pack
  str name=mm1lt;100% 2lt;50% 3lt;100%/str - 13000 results
  str name=mm1lt;100% 2lt;50% 3lt;1/str - the same 13000 results
  str name=mm1lt;100% 2lt;50%/str - very same 13000 results
  str name=mm1lt;100% 2lt;100%/str - 93 results. seems that here i
  get the 33 clause working.
  str name=mm2lt;100%/str - same 93 results, just in case.
  str name=mm2lt;50%/str - very same 13000 results as it should
  str name=mm2lt;-50%/str - 1121 results (weird)
 
  then i tried to control 3-term queries.
 
  str name=mm2lt;-50% 3lt;100%/str 1121, the same as 2-50%,
 ignoring
  the 3 clause.
  str name=mm2lt;-50% 3lt;1/str the same 1121 results, ignoring
 again
  it.
 
  I'd like to accomplish something like this:
  str name=mm2lt;1 3lt;2 4lt;3 8lt;-50%/str
 
  translating: 1 or 2 - 1 term, 3 at least 2, 4 at least 3 and 5, 6, 7, 8
  terms at least half rounded up (5-3, 6-3, 7-4, 8-4)
 
  seems that he's only using 1 and 2 clauses.
 
  thanks in advance
 
  alexei
 
 
 
 
  --
 
  *Alexei Martchenko* | *CEO* | Superdownloads
  ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
  5083.1018/5080.3535/5080.3533
 




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Field grouping?

2011-08-31 Thread Alexei Martchenko
Yes, Ranged Facets
http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range

2011/8/31 Denis Kuzmenok forward...@ukr.net

 Hi.

 Suppose  i  have  a field price with different values, and i want to
 get  ranges for this field depending on docs count, for example i want
 to  get 5 ranges for 100 docs with 20 docs in each range, 6 ranges for
 200 docs = 34 docs in each field, etc.

 Is it possible with solr?




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Solr Faceting DIH

2011-08-30 Thread Alexei Martchenko
I had the same problem with a database here, and we discovered that every
item had its own product page, its own url. So, we decided that our unique
id had to be the url instead of using sql ids and id concatenations.
sometimes it works. You can store all ids if u need them for something, but
for uniqueids, urls go just fine.

2011/8/30 Erick Erickson erickerick...@gmail.com

 I'd really think carefully before disabling unique IDs. If you do,
 you'll have to manage the records yourself, so your next
 delta-import will add more records to your search result, even
 those that have been updated.

 You might do something like make the uniqueKey the
 concatenation of productid and attributeid or whatever
 makes sense.

 Best
 Erick

 On Mon, Aug 29, 2011 at 5:52 PM, Aaron Bains aaronba...@gmail.com wrote:
  Hello,
 
  I am trying to setup Solr Faceting on products by using the
  DataImportHandler to import data from my database. I have setup my
  data-config.xml with the proper queries and schema.xml with the fields.
  After the import/index is complete I can only search one productid record
 in
  Solr. For example of the three productid '10100039' records there are I
 am
  only able to search for one of those. Should I somehow disable unique
 ids?
  What is the best way of doing this?
 
  Below is the schema I am trying to index:
 
  +---+-+-++
  | productid | attributeid | valueid | categoryid |
  +---+-+-++
  |  10100039 |  331100 |1580 |  1 |
  |  10100039 |  331694 |1581 |  1 |
  |  10100039 |33113319 | 1537370 |  1 |
  |  10100040 |  331100 |1580 |  1 |
  |  10100040 |  331694 | 1540230 |  1 |
  |  10100040 |33113319 | 1537370 |  1 |
  +---+-+-++
 
  Thanks!
 




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Solr 3.3 dismax MM parameter not working properly

2011-08-30 Thread Alexei Martchenko
Anyone else strugglin' with dismax's MM parameter?

We're having a problem here, seems that configs from 3 terms and more are
being ignored by solr and it assumes previous configs.

if I use str name=mm3lt;1/str or str name=mm3lt;100%/str i get
the same results for a 3-term query.
If i try str name=mm4lt;25%/str or str name=mm4lt;100%/str I
also get same data for a 4-term query.

I'm searching: windows service pack
str name=mm1lt;100% 2lt;50% 3lt;100%/str - 13000 results
str name=mm1lt;100% 2lt;50% 3lt;1/str - the same 13000 results
str name=mm1lt;100% 2lt;50%/str - very same 13000 results
str name=mm1lt;100% 2lt;100%/str - 93 results. seems that here i get
the 33 clause working.
str name=mm2lt;100%/str - same 93 results, just in case.
str name=mm2lt;50%/str - very same 13000 results as it should
str name=mm2lt;-50%/str - 1121 results (weird)

then i tried to control 3-term queries.

str name=mm2lt;-50% 3lt;100%/str 1121, the same as 2-50%, ignoring
the 3 clause.
str name=mm2lt;-50% 3lt;1/str the same 1121 results, ignoring again
it.

I'd like to accomplish something like this:
str name=mm2lt;1 3lt;2 4lt;3 8lt;-50%/str

translating: 1 or 2 - 1 term, 3 at least 2, 4 at least 3 and 5, 6, 7, 8
terms at least half rounded up (5-3, 6-3, 7-4, 8-4)

seems that he's only using 1 and 2 clauses.

thanks in advance

alexei


Re: Solr 3.3 dismax MM parameter not working properly

2011-08-30 Thread Alexei Martchenko
Hmmm I believe I discovered the problem.

When you have something like this:

250% 6-60%

you should read it from right to left and use the word MORE.

MORE THAN SIX clauses, 60% are optional, MORE THAN TWO clauses (and that
includes 3, 4 and 5 AND 6) half is mandatory.

if you wanna a special rule for 2 terms just add:

11 250% 6-60%

MORE THAN ONE clauses (2) should match 1.

NOW this makes sense!

2011/8/30 Alexei Martchenko ale...@superdownloads.com.br

 Anyone else strugglin' with dismax's MM parameter?

 We're having a problem here, seems that configs from 3 terms and more are
 being ignored by solr and it assumes previous configs.

 if I use str name=mm3lt;1/str or str name=mm3lt;100%/str i
 get the same results for a 3-term query.
 If i try str name=mm4lt;25%/str or str name=mm4lt;100%/str I
 also get same data for a 4-term query.

 I'm searching: windows service pack
 str name=mm1lt;100% 2lt;50% 3lt;100%/str - 13000 results
 str name=mm1lt;100% 2lt;50% 3lt;1/str - the same 13000 results
 str name=mm1lt;100% 2lt;50%/str - very same 13000 results
 str name=mm1lt;100% 2lt;100%/str - 93 results. seems that here i
 get the 33 clause working.
 str name=mm2lt;100%/str - same 93 results, just in case.
 str name=mm2lt;50%/str - very same 13000 results as it should
 str name=mm2lt;-50%/str - 1121 results (weird)

 then i tried to control 3-term queries.

 str name=mm2lt;-50% 3lt;100%/str 1121, the same as 2-50%, ignoring
 the 3 clause.
 str name=mm2lt;-50% 3lt;1/str the same 1121 results, ignoring again
 it.

 I'd like to accomplish something like this:
 str name=mm2lt;1 3lt;2 4lt;3 8lt;-50%/str

 translating: 1 or 2 - 1 term, 3 at least 2, 4 at least 3 and 5, 6, 7, 8
 terms at least half rounded up (5-3, 6-3, 7-4, 8-4)

 seems that he's only using 1 and 2 clauses.

 thanks in advance

 alexei




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: what is scheduling ? why should we do this?how to achieve this ?

2011-08-29 Thread Alexei Martchenko
since solr is basically a http server, all you need is a scheduler to browse
to specific pages.

on windows, u can try the task scheduler (i don't know its name in english)
its the clock icon on the 'administrative tools' section.

coldfusion for instance, has its own scheduler, other languages as php might
have, you can use.

hope it helps.

2011/8/29 nagarjuna nagarjuna.avul...@gmail.com

 Hi pravesh...
 i already saw the wiki page that what u have given...from that i got
 the points about collection distribution etc...
 but i didnt get any link which will explain the cron job process step
 by
 step for the windows OS ..
 can please tell me how to do it for windows?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/what-is-scheduling-why-should-we-do-this-how-to-achieve-this-tp3287115p3292221.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: getting old records in database

2011-08-27 Thread Alexei Martchenko
depends on the case.

we have a database here that updates very frequently, so we just added a
field named syncid and set it to the index day. everytime the database
updates it updates the syncid to the current day. after we perform a full
database update, we tell solr to delete all records different to the current
syncid, or the current day.

its a xml with a delete query -syncid:27 will delete all records not updated
in day 27 update.

with databases that update constantly, it works.

if anyone else knows another solution, please share.

2011/8/27 mss.mss mss.mss...@gmail.com


 hi

   we developed a solr and connected to database and getting the
 records from database. now we  deleted the records in table but iam getting
 the old
 records  in solr... to solve this what we have to do.

 how to solve this problem

 thanks in advance

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/getting-old-records-in-database-tp3288991p3288991.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: commas in synonyms.txt are not escaping

2011-08-26 Thread Alexei Martchenko
Gary, please post the entire field declaration so I can try to reproduce
here

2011/8/26 Moore, Gary gary.mo...@ars.usda.gov


 I have a number of chemical names containing commas which I'm mapping in
 index_synonyms.txt thusly:

 2\,4-D-butotyl=Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D
 3,CCRIS 8562

 According to the sample synonyms.txt, the comma above should be. i.e.
 a\,a=b\,b.The problem is that according to analysis.jsp the commas are
 not being escaped.  If I paste in 2,4-D-butotyl, then no mappings.  If I
 paste in 2\,4-D-butotyl, the mappings are done.  This is verified by there
 being no mappings in the index.  I assume there would be if 2\,4-D-butotyl
 actually appeared in a document.

 The filter I'm declaring in the index analyzer looks like this:

 filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
  tokenizerFactory=solr.KeywordTokenizerFactory ignoreCase=true
 expand=true/

 Doesn't seem to matter which tokenizer I use.This must be something
 simple that I'm not doing but am a bit stumped at the moment and would
 appreciate any tips.
 Thanks
 Gary





-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: commas in synonyms.txt are not escaping

2011-08-26 Thread Alexei Martchenko
Gary, isn't your wordDelimiter removing your commas in the query time? have
u tried it in the analyzer?

2011/8/26 Moore, Gary gary.mo...@ars.usda.gov

 Here you go -- I'm just hacking the text field at the moment.  Thanks,
 Gary

 fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt
 tokenizerFactory=solr.KeywordTokenizerFactory ignoreCase=true
 expand=true/
 !-- Case insensitive stop word removal.
 enablePositionIncrements=true ensures that a 'gap' is left to
 allow for accurate phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
   !--filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true tokenizerFactory=solr.KeywordTokenizerFactory
 expand=true/--
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

 -Original Message-
 From: Alexei Martchenko [mailto:ale...@superdownloads.com.br]
 Sent: Friday, August 26, 2011 10:30 AM
 To: solr-user@lucene.apache.org
 Subject: Re: commas in synonyms.txt are not escaping

 Gary, please post the entire field declaration so I can try to reproduce
 here





-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: hierarchical faceting in Solr?

2011-08-24 Thread Alexei Martchenko
Cheers, very good, congratulations

2011/8/23 Naomi Dushay ndus...@stanford.edu

 Chris Beer just did a revamp of the wiki page at:

  
 http://wiki.apache.org/solr/**HierarchicalFacetinghttp://wiki.apache.org/solr/HierarchicalFaceting

 Yay Chris!

 - Naomi
 ( ... and I helped!)


 On Aug 22, 2011, at 10:49 AM, Naomi Dushay wrote:

  Chris,

 Is there a document somewhere on how to do this?  If not, might you create
 one?   I could even imagine such a document living on the Solr wiki ...
  this one has mostly ancient content:

 http://wiki.apache.org/solr/**HierarchicalFacetinghttp://wiki.apache.org/solr/HierarchicalFaceting

 - Naomi





-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Field type change / copy field

2011-08-24 Thread Alexei Martchenko
have u tried in your facet_year index analyzer something like this?

analyzer type=index
charFilter class=solr.PatternReplaceCharFilterFactory pattern=\n{4}
replacement=$1-01-01T00:00:**00Z replace=all/

this can theoretically do the trick


2011/8/24 Oliver Schihin oliver.schi...@unibas.ch

 Hello list

 My documents come with a field holding a date, always a year:
 year2008/year

 In the schema, this content is taken for a field year as an integer, and
 it will be searchable.

 Through a copyfield-instruction I move the year to a facet_year-field,
 you guess, to use it for faceting and make range queries possible. Its field
 type is of the class 'solr.TrieDateField' that requires canonical date
 representation. Is there a way in solr to extend the simple year to
 facet_year2008-01-01T00:00:**00Z/facet_year. Or, do i have to solve
 the problem in preprocessing, before posting?

 Thanks
 Oliver




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Problem using stop words

2011-08-22 Thread Alexei Martchenko
Funny thing is that stopwords files in the examples shown in
http://wiki.apache.org/solr/LanguageAnalysis#Spanish are actually using pipe
and other terms. See the spanish one in
http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt

I never saw this format before.

Lucas, try to use only one word per line, no pipes, no trailing spaces. and
you can use all spanish accents too. Don't forget to save encoded as
UTF-8... u can do that in Eclipse or even Windows Word can open and save
txts in UTF-8.



2011/8/22 Erick Erickson erickerick...@gmail.com

 What does the admin/analysis page show? And if you're really
 putting the pipe symbol (|)  in you stopwords file, I have no clue what
 Solr will make of it. The stopwords file format is usually just one
 word per line.

 I'm assuming your name of string for the field type is just a placeholder
 or you've replaced the example string fieldType, right?


 Best
 Erick

 On Mon, Aug 22, 2011 at 6:24 AM, Lucas Miguez lucas.mig...@gmail.com
 wrote:
  Hi,
 
  I am trying to use spanish stop words, but the stop words are not
 working:
 
  Part of the schema.xml file:
 
  fieldtype name=string  class=solr.TextField
  positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.SnowballPorterFilterFactory
 language=Spanish /
 filter class=solr.StopFilterFactory
 words=spanish_stop.txt
  enablePositionIncrements=true ignoreCase=true /
/analyzer
analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.SnowballPorterFilterFactory
 language=Spanish /
 filter class=solr.StopFilterFactory
 words=spanish_stop.txt
  enablePositionIncrements=true  ignoreCase=true /
 /analyzer
/fieldtype
 
 ___
 
  A piece of the stopwords file:
 
  de |  from, of
  la |  the, her
  que|  who, that
  el |  the
  en |  in
  y  |  and
  a  |  to
  los|  the, them
  del|  de + el
  se |  himself, from him etc
  las|  the, them
  por|  for, by, etc
  un |  a
  para   |  for
  con|  with
  no |  no
  una|  a
  su |  his, her
  al |  a + el
   | es from SER
  lo |  him
 
 
  Any idea? Thanks!
 




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Problem using stop words

2011-08-22 Thread Alexei Martchenko
That very txt said A Spanish stop word list. Comments begin with vertical
bar. Each stop word is at the start of a line.

Solr's comments are #s not pipes.

Brazilian stopwords file is kinda raw...
http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/br/stopwords.txt

2011/8/22 Alexei Martchenko ale...@superdownloads.com.br

 Funny thing is that stopwords files in the examples shown in
 http://wiki.apache.org/solr/LanguageAnalysis#Spanish are actually using
 pipe and other terms. See the spanish one in
 http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt

 I never saw this format before.

 Lucas, try to use only one word per line, no pipes, no trailing spaces. and
 you can use all spanish accents too. Don't forget to save encoded as
 UTF-8... u can do that in Eclipse or even Windows Word can open and save
 txts in UTF-8.



 2011/8/22 Erick Erickson erickerick...@gmail.com

 What does the admin/analysis page show? And if you're really
 putting the pipe symbol (|)  in you stopwords file, I have no clue what
 Solr will make of it. The stopwords file format is usually just one
 word per line.

 I'm assuming your name of string for the field type is just a
 placeholder
 or you've replaced the example string fieldType, right?


 Best
 Erick

 On Mon, Aug 22, 2011 at 6:24 AM, Lucas Miguez lucas.mig...@gmail.com
 wrote:
  Hi,
 
  I am trying to use spanish stop words, but the stop words are not
 working:
 
  Part of the schema.xml file:
 
  fieldtype name=string  class=solr.TextField
  positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.SnowballPorterFilterFactory
 language=Spanish /
 filter class=solr.StopFilterFactory
 words=spanish_stop.txt
  enablePositionIncrements=true ignoreCase=true /
/analyzer
analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.SnowballPorterFilterFactory
 language=Spanish /
 filter class=solr.StopFilterFactory
 words=spanish_stop.txt
  enablePositionIncrements=true  ignoreCase=true /
 /analyzer
/fieldtype
 
 ___
 
  A piece of the stopwords file:
 
  de |  from, of
  la |  the, her
  que|  who, that
  el |  the
  en |  in
  y  |  and
  a  |  to
  los|  the, them
  del|  de + el
  se |  himself, from him etc
  las|  the, them
  por|  for, by, etc
  un |  a
  para   |  for
  con|  with
  no |  no
  una|  a
  su |  his, her
  al |  a + el
   | es from SER
  lo |  him
 
 
  Any idea? Thanks!
 




 --

 *Alexei Martchenko* | *CEO* | Superdownloads
 ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
 5083.1018/5080.3535/5080.3533




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Problem using stop words

2011-08-22 Thread Alexei Martchenko
No, I think you're right, i've never seen pipes as comments before...

2011/8/22 Erick Erickson erickerick...@gmail.com

 Ahh, you're right. I was wy off base there

 So I guess the question is how you know the words aren't being removed? A
 common
 problem is to look at *stored* fields rather than what's actually in
 the inverted index.
 The TermsComponent can help here:
 http://wiki.apache.org/solr/TermsComponent

 Erick

 On Mon, Aug 22, 2011 at 11:28 AM, Alexei Martchenko
 ale...@superdownloads.com.br wrote:
  That very txt said A Spanish stop word list. Comments begin with
 vertical
  bar. Each stop word is at the start of a line.
 
  Solr's comments are #s not pipes.
 
  Brazilian stopwords file is kinda raw...
 
 http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/br/stopwords.txt
 
  2011/8/22 Alexei Martchenko ale...@superdownloads.com.br
 
  Funny thing is that stopwords files in the examples shown in
  http://wiki.apache.org/solr/LanguageAnalysis#Spanish are actually using
  pipe and other terms. See the spanish one in
 
 http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt
 
  I never saw this format before.
 
  Lucas, try to use only one word per line, no pipes, no trailing spaces.
 and
  you can use all spanish accents too. Don't forget to save encoded as
  UTF-8... u can do that in Eclipse or even Windows Word can open and save
  txts in UTF-8.
 
 
 
  2011/8/22 Erick Erickson erickerick...@gmail.com
 
  What does the admin/analysis page show? And if you're really
  putting the pipe symbol (|)  in you stopwords file, I have no clue what
  Solr will make of it. The stopwords file format is usually just one
  word per line.
 
  I'm assuming your name of string for the field type is just a
  placeholder
  or you've replaced the example string fieldType, right?
 
 
  Best
  Erick
 
  On Mon, Aug 22, 2011 at 6:24 AM, Lucas Miguez lucas.mig...@gmail.com
  wrote:
   Hi,
  
   I am trying to use spanish stop words, but the stop words are not
  working:
  
   Part of the schema.xml file:
  
   fieldtype name=string  class=solr.TextField
   positionIncrementGap=100 autoGeneratePhraseQueries=true
 analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory
  language=Spanish /
  filter class=solr.StopFilterFactory
  words=spanish_stop.txt
   enablePositionIncrements=true ignoreCase=true /
 /analyzer
 analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory
  language=Spanish /
  filter class=solr.StopFilterFactory
  words=spanish_stop.txt
   enablePositionIncrements=true  ignoreCase=true /
  /analyzer
 /fieldtype
  
 
 ___
  
   A piece of the stopwords file:
  
   de |  from, of
   la |  the, her
   que|  who, that
   el |  the
   en |  in
   y  |  and
   a  |  to
   los|  the, them
   del|  de + el
   se |  himself, from him etc
   las|  the, them
   por|  for, by, etc
   un |  a
   para   |  for
   con|  with
   no |  no
   una|  a
   su |  his, her
   al |  a + el
| es from SER
   lo |  him
  
  
   Any idea? Thanks!
  
 
 
 
 
  --
 
  *Alexei Martchenko* | *CEO* | Superdownloads
  ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
  5083.1018/5080.3535/5080.3533
 
 
 
 
  --
 
  *Alexei Martchenko* | *CEO* | Superdownloads
  ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
  5083.1018/5080.3535/5080.3533
 




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: How to implement Spell Checker using Solr?

2011-08-22 Thread Alexei Martchenko
What is the error?

2011/8/22 anupamxyz cse.anu...@gmail.com

 The changes for Solrconfig.xml in solr is as follows
 searchComponent name=spellcheck class=solr.SpellCheckComponent

lst name=spellchecker

  str name=namedefault/str

  str name=classnamesolr.IndexBasedSpellChecker/str

  str name=fieldspell/str

  str name=spellcheckIndexDir./spellchecker/str

  str name=accuracy0.7/str

  float name=thresholdTokenFrequency.0001/float
/lst

lst name=spellchecker
  str name=namejarowinkler/str
  str name=fieldlowerfilt/str

  str

 name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str
  str name=spellcheckIndexDir./spellchecker/str

/lst


str name=queryAnalyzerFieldTypetextSpell/str
 /searchComponent

 And for the Request handler, I have incorporated the following changes:


 requestHandler name=/spellCheckCompRH class=solr.SearchHandler
lst name=defaults

  str name=spellchecktrue/str

  str name=spellcheck.onlyMorePopularfalse/str

  str name=spellcheck.dictionarydefault/str

  str name=spellcheck.extendedResultsfalse/str

  str name=spellcheck.count5/str
str name=spellcheck.buildtrue/str
  str name=spellcheck.collatetrue/str
/lst
arr name=last-components
  strspellcheck/str
/arr
  /requestHandler

 The same is failing while crawling. I have reveretd my code for now. But
 can
 try it once again and post the exception that I have been getting while
 crawling.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-implement-Spell-Checker-using-Solr-tp3268450p3274069.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: how can i develop client application with solr url using javascript?

2011-08-22 Thread Alexei Martchenko
before setting up your solr to response directly to jquery did you manage to
bulletproof it agains unwanted deletes? how will you protect your database?
be careful before exposing solr directly to 'the world'.

2011/8/22 nagarjuna nagarjuna.avul...@gmail.com

 hi everybody ,
i have solr url which produces json response format ...i would like to
 develop a client application using javascript which is automatic search
 field please send me any samples or any sample code.
 i need to use my solr url in jscript or jquery file to implement automatic
 search field




 Thanks in advance

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/how-can-i-develop-client-application-with-solr-url-using-javascript-tp3275506p3275506.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: hl.useFastVectorHighlighter, fragmentsBuilder and HighlightingParameters

2011-08-19 Thread Alexei Martchenko
Hi Koji, thanks, it's loading right now. Can't say it's really working
though, but I believe those are other issues with FastVectorHighlighter

2011/8/18 Koji Sekiguchi k...@r.email.ne.jp

 (11/08/19 4:14), Alexei Martchenko wrote:

 Hi Koji thanks for the reply.

 MyfragmentsBuilder  is defined directly inconfig. SOLR 3.3 warns me
 highlighting  is a deprecated form do you think it is in the wrong
 place?


 Hi Alexei,

 Yes, it is incorrect. What deprecate is that highlighting tag just under
 config directly.
 After 3.1, it needs to be under searchComponent for HighlightComponent.
 Please consult
 solrconfig.xml in example 3.3.


 koji
 --
 Check out Query Log Visualizer
 http://www.rondhuit-demo.com/**loganalyzer/loganalyzer.htmlhttp://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
 http://www.rondhuit.com/en/




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: ClassNotFoundException when trying to make spellcheck JaraWinkler working

2011-08-18 Thread Alexei Martchenko
Good knowledge for everybody, those little mistakes like spaces, typos and
lack of commas makes lose so many time. thanks for posting this.

2011/8/18 Mike Mander wicket-m...@gmx.de

 Solution found.
 The original solr-config.xml jarowinkler definition had some line breaks.
 If i write the difinition in one line (no tabs, no line breaks) server
 starts without exception


 str name=distanceMeasureorg.**apache.lucene.search.spell.**
 JaroWinklerDistance/str


 Thanks for helping me
 Mike

  Hi Mike, is your config like this?
 Is queryAnalyzerFieldType matching your type of field to be indexed?
 Is the field correct?

 searchComponent name=spellcheck class=solr.**SpellCheckComponent
 str name=queryAnalyzerFieldType**textSpell/str
 lst name=spellchecker
 str name=namejarowinkler/str
 str name=fieldsear_spellterms/**str
 str name=buildOnCommitfalse/**str
 str name=buildOnOptimizetrue/**str
 str
 name=distanceMeasureorg.**apache.lucene.search.spell.**
 JaroWinklerDistance/str
 str name=spellcheckIndexDir./**spellchecker_jarowinkler/str
 /lst
 /searchComponent

 2011/8/17 Mike Manderwicket-m...@gmx.de

  Hello,

 i get a ClassNotFoundException for JaraWinklerDistance when i start the
 solr example server.
 I simply copied the server and uncommented the spellchecker in
 example/conf/solr-config.xml
 I did nothing else.

 I already googled but didn't get a hint. Can someone help me please.

 Thanks
 Mike

 Stacktrace:

 C:\Users\m.mander\Desktop\temp\apache-solr-3.3.0\examplejava
 -jar
 start.jar
 2011-08-17 14:55:20.379:INFO::Logging to STDERR via
 org.mortbay.log.StdErrLog
 2011-08-17 14:55:20.462:INFO::jetty-6.1-SNAPSHOT
 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader
 locateSolrHome
 INFO: JNDI not configured for solr (NoInitialContextEx)
 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader
 locateSolrHome
 INFO: solr home defaulted to 'solr/' (could not find system property or
 JNDI)
 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoaderinit
 INFO: Solr home set to 'solr/'
 17.08.2011 14:55:20 org.apache.solr.servlet.SolrDispatchFilter init
 INFO: SolrDispatchFilter.init()
 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader
 locateSolrHome
 INFO: JNDI not configured for solr (NoInitialContextEx)
 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader
 locateSolrHome
 INFO: solr home defaulted to 'solr/' (could not find system property or
 JNDI)
 17.08.2011 14:55:20 org.apache.solr.core.CoreContainer$Initializer
 initialize
 INFO: looking for solr.xml: C:\Users\m.mander\Desktop\**
 temp\apache-solr-3.3.0\example\solr\solr.xml
 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader
 locateSolrHome
 INFO: JNDI not configured for solr (NoInitialContextEx)
 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader
 locateSolrHome
 INFO: solr home defaulted to 'solr/' (could not find system property or
 JNDI)
 17.08.2011 14:55:20 org.apache.solr.core.CoreContainerinit
 INFO: New CoreContainer: solrHome=solr/ instance=22725577
 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoaderinit
 INFO: Solr home set to 'solr/'
 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoaderinit
 INFO: Solr home set to 'solr\.\'
 17.08.2011 14:55:20 org.apache.solr.core.SolrConfig initLibs
 INFO: Adding specified lib dirs to ClassLoader
 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader
 replaceClassLoader
 INFO: Adding 'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.*
 ***
 0/contrib/extraction/lib/asm-3.1.jar' to classloader
 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader
 replaceClassLoader
 INFO: Adding 'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.*
 ***
 0/contrib/extraction/lib/asm-LICENSE-BSD_LIKE.txt' to classloader
 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader
 replaceClassLoader
 INFO: Adding 'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.*
 ***
 0/contrib/extraction/lib/asm-NOTICE.txt' to classloader
 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader
 replaceClassLoader
 INFO: Adding 'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.*
 ***
 0/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to classloader
 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader
 replaceClassLoader
 INFO: Adding 'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.*
 ***
 0/contrib/extraction/lib/bcmail-LICENSE-BSD_LIKE.txt' to classloader
 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader
 replaceClassLoader
 INFO: Adding 'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.*
 ***
 0/contrib/extraction/lib/bcmail-NOTICE.txt' to classloader
 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader
 replaceClassLoader
 INFO: Adding 'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.*
 ***
 0/contrib/extraction/lib/bcprov-jdk15-1.45.jar' to 

Re: suggester issues

2011-08-18 Thread Alexei Martchenko
It can be done, I did that with shingles, but it's not the way it's meant to
be. The main problem with suggester is that we want compound words and we
never get them. I try to get internet explorer but when i enter in the
second word, internet e the suggester never finds explorer.

2011/8/18 oberman_cs ober...@civicscience.com

 I was trying to deal with the exact same issue, with the exact same
 results.
 Is there really no way to feed a phrase into the suggester (spellchecker)
 without it splitting the input phrase into words?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: hl.useFastVectorHighlighter, fragmentsBuilder and HighlightingParameters

2011-08-18 Thread Alexei Martchenko
Hi Koji thanks for the reply.

My fragmentsBuilder is defined directly in config. SOLR 3.3 warns me
highlighting is a deprecated form do you think it is in the wrong
place?

2011/8/17 Koji Sekiguchi k...@r.email.ne.jp

 Alexei,

 From the log, I think Solr couldn't find colored fragmentsBuilder defined
 in solrconfig.xml.
 Can you check the following fragmentsBuilder/ setting in
 searchComponent**highlighting...
 /highlighting/**searchComponent in solrconfig.xml?

 koji
 --
 Check out Query Log Visualizer
 http://www.rondhuit-demo.com/**loganalyzer/loganalyzer.htmlhttp://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
 http://www.rondhuit.com/en/


 (11/08/16 8:51), Alexei Martchenko wrote:

 I'm having some trouble trying to upgrade my old hightligher
 fromhighlightingfragmenter**formatter  format (1.4 version, default
 config in the solr website) to the new Fast Vector highlighter.

 I'm using SOLR 3.3.0 withluceneMatchVersion**
 LUCENE_33/luceneMatchVersion
 inconfig

 In my solrconfig.xml i added these lines

 in the default request handler:

 bool name=hl.**useFastVectorHighlighter**true/bool
 bool name=hl.usePhraseHighlighter**true/bool
 bool name=hl.highlightMultiTerm**true/bool
 str name=hl.fragmentsBuilder**colored/str

 and

 fragmentsBuilder name=colored
 class=org.apache.solr.**highlight.**ScoreOrderFragmentsBuilder
   lst name=defaults
 str name=hl.tag.pre![CDATA[
  b style=background:yellow,b style=background:lawgreen,
  b style=background:aquamarine**,b
 style=background:magenta,
  b style=background:palegreen,**b style=background:coral,
  b style=background:wheat,b style=background:khaki,
  b style=background:lime,b
 style=background:deepskyblue**]]/str
 str name=hl.tag.post![CDATA[/**b]]/str
   /lst
 /fragmentsBuilder

 All I get is: ('grave' means severe)

 15/08/2011 20:44:19 org.apache.solr.common.**SolrException log
 GRAVE: org.apache.solr.common.**SolrException: Unknown fragmentsBuilder:
 colored
 at
 org.apache.solr.highlight.**DefaultSolrHighlighter.**
 getSolrFragmentsBuilder(**DefaultSolrHighlighter.java:**320)
 at
 org.apache.solr.highlight.**DefaultSolrHighlighter.**
 doHighlightingByFastVectorHigh**lighter(**DefaultSolrHighlighter.java:**
 508)

 at
 org.apache.solr.highlight.**DefaultSolrHighlighter.**doHighlighting(**
 DefaultSolrHighlighter.java:**376)
 at
 org.apache.solr.handler.**component.HighlightComponent.**
 process(HighlightComponent.**java:116)
 at
 org.apache.solr.handler.**component.SearchHandler.**handleRequestBody(**
 SearchHandler.java:194)
 at
 org.apache.solr.handler.**RequestHandlerBase.**handleRequest(**
 RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.**execute(SolrCore.java:1368)
 at
 org.apache.solr.servlet.**SolrDispatchFilter.execute(**
 SolrDispatchFilter.java:356)
 at
 org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
 SolrDispatchFilter.java:252)
 at
 org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
 doFilter(ServletHandler.java:**1212)
 at
 org.mortbay.jetty.servlet.**ServletHandler.handle(**
 ServletHandler.java:399)
 at
 org.mortbay.jetty.security.**SecurityHandler.handle(**
 SecurityHandler.java:216)
 at
 org.mortbay.jetty.servlet.**SessionHandler.handle(**
 SessionHandler.java:182)
 at
 org.mortbay.jetty.handler.**ContextHandler.handle(**
 ContextHandler.java:766)
 at
 org.mortbay.jetty.webapp.**WebAppContext.handle(**WebAppContext.java:450)
 at
 org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(**
 ContextHandlerCollection.java:**230)
 at
 org.mortbay.jetty.handler.**HandlerCollection.handle(**
 HandlerCollection.java:114)
 at
 org.mortbay.jetty.handler.**HandlerWrapper.handle(**
 HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.**handle(Server.java:326)
 at
 org.mortbay.jetty.**HttpConnection.handleRequest(**
 HttpConnection.java:542)
 at
 org.mortbay.jetty.**HttpConnection$RequestHandler.**
 headerComplete(HttpConnection.**java:928)
 at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:549)
 at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.**
 java:212)
 at org.mortbay.jetty.**HttpConnection.handle(**
 HttpConnection.java:404)
 at
 org.mortbay.jetty.bio.**SocketConnector$Connection.**
 run(SocketConnector.java:228)
 at
 org.mortbay.thread.**QueuedThreadPool$PoolThread.**
 run(QueuedThreadPool.java:582)

 Docs in 
 http://wiki.apache.org/solr/**HighlightingParametershttp://wiki.apache.org/solr/HighlightingParameterssay:

 hl.fragmentsBuilder

 Specify the name of
 SolrFragmentsBuilderhttp://**wiki.apache.org/solr/**SolrFragmentsBuilderhttp://wiki.apache.org/solr/SolrFragmentsBuilder
 
 . [image:!] 
 Solr3.1http://wiki.apache.**org/solr/Solr3.1http://wiki.apache.org/solr/Solr3.1
  This parameter
 makes sense

Re: ClassNotFoundException when trying to make spellcheck JaraWinkler working

2011-08-17 Thread Alexei Martchenko
Hi Mike, is your config like this?
Is queryAnalyzerFieldType matching your type of field to be indexed?
Is the field correct?

searchComponent name=spellcheck class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetextSpell/str
lst name=spellchecker
str name=namejarowinkler/str
str name=fieldsear_spellterms/str
str name=buildOnCommitfalse/str
str name=buildOnOptimizetrue/str
str
name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str
str name=spellcheckIndexDir./spellchecker_jarowinkler/str
/lst
/searchComponent

2011/8/17 Mike Mander wicket-m...@gmx.de

 Hello,

 i get a ClassNotFoundException for JaraWinklerDistance when i start the
 solr example server.
 I simply copied the server and uncommented the spellchecker in
 example/conf/solr-config.xml
 I did nothing else.

 I already googled but didn't get a hint. Can someone help me please.

 Thanks
 Mike

 Stacktrace:

 C:\Users\m.mander\Desktop\**temp\apache-solr-3.3.0\**examplejava -jar
 start.jar
 2011-08-17 14:55:20.379:INFO::Logging to STDERR via
 org.mortbay.log.StdErrLog
 2011-08-17 14:55:20.462:INFO::jetty-6.1-**SNAPSHOT
 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
 locateSolrHome
 INFO: JNDI not configured for solr (NoInitialContextEx)
 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
 locateSolrHome
 INFO: solr home defaulted to 'solr/' (could not find system property or
 JNDI)
 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader init
 INFO: Solr home set to 'solr/'
 17.08.2011 14:55:20 org.apache.solr.servlet.**SolrDispatchFilter init
 INFO: SolrDispatchFilter.init()
 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
 locateSolrHome
 INFO: JNDI not configured for solr (NoInitialContextEx)
 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
 locateSolrHome
 INFO: solr home defaulted to 'solr/' (could not find system property or
 JNDI)
 17.08.2011 14:55:20 org.apache.solr.core.**CoreContainer$Initializer
 initialize
 INFO: looking for solr.xml: C:\Users\m.mander\Desktop\**
 temp\apache-solr-3.3.0\**example\solr\solr.xml
 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
 locateSolrHome
 INFO: JNDI not configured for solr (NoInitialContextEx)
 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
 locateSolrHome
 INFO: solr home defaulted to 'solr/' (could not find system property or
 JNDI)
 17.08.2011 14:55:20 org.apache.solr.core.**CoreContainer init
 INFO: New CoreContainer: solrHome=solr/ instance=22725577
 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader init
 INFO: Solr home set to 'solr/'
 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader init
 INFO: Solr home set to 'solr\.\'
 17.08.2011 14:55:20 org.apache.solr.core.**SolrConfig initLibs
 INFO: Adding specified lib dirs to ClassLoader
 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
 replaceClassLoader
 INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
 0/contrib/extraction/lib/asm-**3.1.jar' to classloader
 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
 replaceClassLoader
 INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
 0/contrib/extraction/lib/asm-**LICENSE-BSD_LIKE.txt' to classloader
 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
 replaceClassLoader
 INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
 0/contrib/extraction/lib/asm-**NOTICE.txt' to classloader
 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
 replaceClassLoader
 INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
 0/contrib/extraction/lib/**bcmail-jdk15-1.45.jar' to classloader
 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
 replaceClassLoader
 INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
 0/contrib/extraction/lib/**bcmail-LICENSE-BSD_LIKE.txt' to classloader
 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
 replaceClassLoader
 INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
 0/contrib/extraction/lib/**bcmail-NOTICE.txt' to classloader
 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
 replaceClassLoader
 INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
 0/contrib/extraction/lib/**bcprov-jdk15-1.45.jar' to classloader
 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
 replaceClassLoader
 INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
 0/contrib/extraction/lib/**bcprov-LICENSE-BSD_LIKE.txt' to classloader
 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
 replaceClassLoader
 INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
 0/contrib/extraction/lib/**bcprov-NOTICE.txt' to classloader
 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
 replaceClassLoader
 INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
 

Re: Solr 1.4.1 vs 3.3 (Speed)

2011-08-17 Thread Alexei Martchenko
I'm doing the exact same migration... what I've accomplished so far

   1. In solrconfig.xml i
   put luceneMatchVersionLUCENE_33/luceneMatchVersion in the first line in
   the config branch. Warnings go like crazy if you don't do that.
   2. Highlighter shows a deprecated warning, i'm still working on that. It
   works, but I'd like to use the new fastvectorhighlight wich i'm strugglin'
   to death right now
   3. All my speed measures are doing exact the same. sometimes we lose
   60ms, sometimes we gain 60ms, so it's about average. I'll rebuild the index
   from scratch to see differences maybe today or later this week
   4. Since i had to turned termVectors=true termPositions=true
   termOffsets=true in 3 fileds to use fastvectorhighlight, i expect speed
   gains in HL


2011/8/17 Samarendra Pratap samarz...@gmail.com

 Hi we are planning to migrate from solr 1.4.1 to solr 3.3 and I am doing a
 manual performance comparison.

 We have setup two different solr installations (1.4.1 and 3.3) on different
 ports.
  1. Both have same index (old lucene format index) of around 20 GB with 10
 million documents and 60 fields (40 fields with indexed=true).
  2. Both processes have  max 4GB memory allocated (-Xms2048m -Xmx4096m)
  3. Both installation are on same server (8 processor Intel(R) Core(TM) i7
 CPU 930 @ 2.80GHz, 8GB RAM, 64 bit linux system)
  4. We are running solr 1.4.1 with collapsing patch
 (SOLR-236-1_4_1.patchhttps://issues.apache.org/jira/browse/SOLR-236
 ).

  When I pass exactly similar query to both the servers one by one solr
 1.4.1
 is more efficient than solr 3.3.
  Before I convert the index into LUCENE_33 format I thought it would be
 good
 to take the expert advice.

  Is there something which I should look into deeply? Or could this be
 effect
 of old index format with new version and should be ignored?

  When I used debugQuery=true, it clearly shows
 that org.apache.solr.handler.component.CollapseComponent (solr 1.4.1)
 noticeably taking less time
 than org.apache.solr.handler.component.QueryComponent (solr 3.3).

  I am testing this against simple queries without any faceting,
 highlighting, collapsing etc. (*

 http://xxx.xxx:8983/solr/select/?q=Packaging%20Material,%20Suppliesqt=dismaxqf=category

 ^4.0qf=keywords^2.0qf=title^2.0qf=smalldescqf=companynameqf=usercategoryqf=usrpcatdescqf=cityqs=10pf=category^4.0pf=keywords^3pf=title^3pf=smalldesc^1.5pf=companynamepf=usercategorypf=usrpcatdescpf=cityps=0bq=type:[149%20TO%201500]^3start=0rows=50fl=title,smalldesc,iddebugQuery=true
 *)

  Any insights by the experts would be greatly appreciated!

  Thanks in advance.

 --
 Regards,
 Samar




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Spell Checker

2011-08-17 Thread Alexei Martchenko
Its not a file, it's a request handler. you add those in the solrconfig.xml

read here plz http://wiki.apache.org/solr/Suggester

2011/8/17 naeluh nae...@gmail.com

 Hi Dan,

 I saw this command -


 http://localhost:8983/solr/spell?q=ANYTHINGHEREspellcheck=truespellcheck.collate=truespellcheck.build=true

 I tried to issue it and got  404 error that I did not have the path
 /solr/spell
 Should I add this file and what type of file is it.

 I got to via he post on Drupal - http://drupal.org/node/975132

 thanks !

 Nick

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Spell-Checker-tp1914336p3262684.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: suggester issues

2011-08-17 Thread Alexei Martchenko
I have the very very very same problem. I could copy+paste your message as
mine. I've discovered so far that bigger dictionaries work better for me,
controlling threshold is much better than avoid indexing one or twio fields.
Of course i'm still polishing this.

At this very moment I was looking into Shingles, are you using them?
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory

How are your fields?

2011/8/17 Kuba Krzemień krzemien.k...@gmail.com

 Hello, I am working on creating a auto-complete functionality for my
 platform which indexes large ammounts of text (title + contents) - there is
 too much data for a dictionary. I am using the latest version of Solr (3.3)
 and I am trying to take advantage of the Suggester functionality.
 Unfortunately so far the outcome isn't that great.

 The Suggester works only for single words or whole phrases (depends on the
 tokenizer). When using the first option, I am unable to suggest any combined
 queries. For example the suggestion for 'ne' will be 'new'. Suggestion for
 'new y' will be two separate lists, one for 'new' and one for 'y'. Whats
 worse, querying 'new AND y' gives the same results (also when using
 collate), which means that the returned suggestion may give no results -
 what makes sense separately often doesn't work combined. I need a way to
 find only those suggestions, that will return results when doing a AND query
 (for example 'new AND york', 'new AND year', as long as they give results
 upon querying - 'new AND yeti' shouldn't be returned as a suggestion).

 When I use the second tokenizer and the suggestions return phrases, for
 'ne' I will get 'new york' and 'new year', but for 'new y' I will get
 nothing. Also, for 'y' I will get nothing, so the issue remains.

 If someone has some experience working with the Suggester, or if someone
 has created a well working auto-suggester based on Solr, please help me.
 I've been trying to find a sollution for this for quite some time.

 Yours sincerely,
 Jackob K




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: suggester issues

2011-08-17 Thread Alexei Martchenko
I've been indexing and reindexing stuff here with Shingles. I don't believe
it's the best approach. Results are interesting, but I believe it's not what
the suggester is meant to be.

I tried

fieldType name=textSuggestion class=solr.TextField
positionIncrementGap=10 stored=false multiValued=true
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ShingleFilterFactory maxShingleSize=4
outputUnigrams=true outputUnigramsIfNoShingles=false /
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

but I got compound words in the suggestion itself.

If you query them like http://localhost:8983/solr/{mycore}/suggest/?q=dri i
get

response
lst name=responseHeader
int name=status0/int
int name=QTime1/int
/lst
lst name=spellcheck
lst name=suggestions
lst name=dri
int name=numFound6/int
int name=startOffset0/int
int name=endOffset3/int
arr name=suggestion
strdrivers/str
strdrivers nvidia/str
strdrivers intel/str
strdrivers nvidia geforce/str
strdrive/str
strdriver/str
/arr
/lst
str name=collationdrivers/str
/lst
/lst
/response

but when i enter the second word,
http://localhost:8983/solr/{mycore}/suggest/?q=drivers%20nhttp://localhost:8983/solr/%7Bmycore%7D/suggest/?q=drivers%20n
it
scrambles everything

response
lst name=responseHeader
int name=status0/int
int name=QTime0/int
/lst
lst name=spellcheck
lst name=suggestions
lst name=drivers
int name=numFound4/int
int name=startOffset0/int
int name=endOffset7/int
arr name=suggestion
strdrivers/str
strdrivers nvidia/str
strdrivers intel/str
strdrivers nvidia geforce/str
/arr
/lst
lst name=n
int name=numFound10/int
int name=startOffset8/int
int name=endOffset9/int
arr name=suggestion
strnvidia/str
strnet/str
strnvidia geforce/str
strnetwork/str
strnew/str
strn/str
strninja/str
/arr
/lst
str name=collationdrivers nvidia/str
/lst
/lst
/response

Although the collation seems fine for this, it's not exactly what suggester
is supposed to do.

Any thoughts?

2011/8/17 Alexei Martchenko ale...@superdownloads.com.br

 I have the very very very same problem. I could copy+paste your message as
 mine. I've discovered so far that bigger dictionaries work better for me,
 controlling threshold is much better than avoid indexing one or twio fields.
 Of course i'm still polishing this.

 At this very moment I was looking into Shingles, are you using them?
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory

 How are your fields?

 2011/8/17 Kuba Krzemień krzemien.k...@gmail.com

 Hello, I am working on creating a auto-complete functionality for my
 platform which indexes large ammounts of text (title + contents) - there is
 too much data for a dictionary. I am using the latest version of Solr (3.3)
 and I am trying to take advantage of the Suggester functionality.
 Unfortunately so far the outcome isn't that great.

 The Suggester works only for single words or whole phrases (depends on the
 tokenizer). When using the first option, I am unable to suggest any combined
 queries. For example the suggestion for 'ne' will be 'new'. Suggestion for
 'new y' will be two separate lists, one for 'new' and one for 'y'. Whats
 worse, querying 'new AND y' gives the same results (also when using
 collate), which means that the returned suggestion may give no results -
 what makes sense separately often doesn't work combined. I need a way to
 find only those suggestions, that will return results when doing a AND query
 (for example 'new AND york', 'new AND year', as long as they give results
 upon querying - 'new AND yeti' shouldn't be returned as a suggestion).

 When I use the second tokenizer and the suggestions return phrases, for
 'ne' I will get 'new york' and 'new year', but for 'new y' I will get
 nothing. Also, for 'y' I will get nothing, so the issue remains.

 If someone has some experience working with the Suggester, or if someone
 has created a well working auto-suggester based on Solr, please help me.
 I've been trying to find a sollution for this for quite some time.

 Yours sincerely,
 Jackob K




 --

 *Alexei Martchenko* | *CEO* | Superdownloads
 ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
 5083.1018/5080.3535/5080.3533




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Spell Checker

2011-08-17 Thread Alexei Martchenko
No, if you are trying to build a suggester (what It seems to be) please read
the url I sent you.

You'll need to create the suggester itself searchComponent
class=solr.SpellCheckComponent name=suggest

and the url handler requestHandler
class=org.apache.solr.handler.component.SearchHandler name=/suggest

in your case, to work on that url, just rename it to requestHandler
class=org.apache.solr.handler.component.SearchHandler name=/spell

2011/8/17 naeluh nae...@gmail.com

 so I add spellcheck.build=true to solrconfig.xml  just anywhere and that
 will
 wrk?

 thks very much for your help

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Spell-Checker-tp1914336p3262744.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Spell Checker

2011-08-17 Thread Alexei Martchenko
Config your xml properly, reload your core (or reload solr) then commit.
This spellchecker is configured to build on commit str
name=buildOnCommittrue/str. Everytime you commit something, it will
rebuild your dictionary based on the configuration you selected.

2011/8/17 naeluh nae...@gmail.com

 so I add spellcheck.build=true to solrconfig.xml  just anywhere and that
 will
 wrk?

 thks very much for your help

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Spell-Checker-tp1914336p3262744.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Solr spellcheck and multiple collations

2011-08-17 Thread Alexei Martchenko
Can u show us how is your schema and config?

I believe that's how collation is: the best match, only one.

2011/8/17 Herman Kiefus herm...@angieslist.com

 After a bit of work, we have 'spellchecking' up and going and we are happy
 with the suggestions.  I have not; however, ever been able to generate more
 than one collation query.  Is there something simple that I have overlooked?




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Solr spellcheck and multiple collations

2011-08-17 Thread Alexei Martchenko
Thank you very much for this awesome config. I'm working on it as we speak.

2011/8/17 Herman Kiefus herm...@angieslist.com

 If you only get one, best, collation then there is no point to my question;
 however, since you asked...

 The relevant sections:

 Solrconfig.xml -

 searchComponent name=spellcheck class=solr.SpellCheckComponent

 str name=queryAnalyzerFieldTypetextDictionary/str

 lst name=spellchecker
str name=namedefault/str
str name=classnamesolr.IndexBasedSpellChecker/str
str name=fieldTermsDictionary/str
str name=spellcheckIndexDir./spellchecker/str
float name=thresholdTokenFrequency0.0/float
str name=comparatorClassscore/str
 /lst

 Schema.xml -

 fieldType name=textCorrectlySpelled class=solr.TextField
 positionIncrementGap=100 omitNorms=true
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
filter class=solr.KeepWordFilterFactory
 words=correctly_spelled_terms.txt ignoreCase=true/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StandardFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StandardFilterFactory/
/analyzer
 /fieldType

 fieldType name=textDictionary class=solr.TextField
 positionIncrementGap=100 omitNorms=true
!-- No index-time analysis as that is done by the fields that
 source fields of this type--
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StandardFilterFactory/
/analyzer
 /fieldType

 field name=CorrectlySpelledTerms type=textCorrectlySpelled
 indexed=false stored=false multiValued=true/
 field name=TermsDictionary type=textDictionary indexed=true
 stored=false multiValued=true/

 !-- Those fields that will have misspellings stripped before they are put
 into the dictionary --
 copyField source=BusinessDescription dest=CorrectlySpelledTerms/
 copyField source=Services dest=CorrectlySpelledTerms/
 copyField source=ServiceArea dest=CorrectlySpelledTerms/
 copyField source=City dest=CorrectlySpelledTerms/
 copyField source=CategoryName dest=CorrectlySpelledTerms/
 copyField source=MedicalSpecialtyDescription
 dest=CorrectlySpelledTerms/
 copyField source=ReportComment dest=CorrectlySpelledTerms/
 copyField source=ReportDescription dest=CorrectlySpelledTerms/
 copyField source=ReportMediaDescription dest=CorrectlySpelledTerms/
 copyField source=AdditionalReportInformationAnswer
 dest=CorrectlySpelledTerms/

 !-- The dictionary source field --
 !-- Those fields that are not spell checked but rather appear in the
 dictionary as is --
 copyField source=Name dest=TermsDictionary/
 copyField source=Contact dest=TermsDictionary/
 !-- Plus the rmainder of those fields that are spellchecked --
 copyField source =CorrectlySpelledTerms dest=TermsDictionary/

 -Original Message-
 From: Alexei Martchenko [mailto:ale...@superdownloads.com.br]
 Sent: Wednesday, August 17, 2011 5:34 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr spellcheck and multiple collations

 Can u show us how is your schema and config?

 I believe that's how collation is: the best match, only one.

 2011/8/17 Herman Kiefus herm...@angieslist.com

  After a bit of work, we have 'spellchecking' up and going and we are
  happy with the suggestions.  I have not; however, ever been able to
  generate more than one collation query.  Is there something simple that I
 have overlooked?
 



 --

 *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br|
 ale...@martchenko.com.br | (11)
 5083.1018/5080.3535/5080.3533




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Random + Boost?

2011-08-16 Thread Alexei Martchenko
To make random results i'd use something related to dates and milliseconds,
not boosting. lemme think about this...

2011/8/16 Ahmet Arslan iori...@yahoo.com

  This might seem odd, but is it possible to use boost with
  random ordering?
  That is, documents that get boosted are more likely to
  appear towards the
  top of the ordering (I only display page 1, say 30
  documents). Does that
  make sense? I'm assuming that random ordering is, well,
  really random - so
  then it's not possible. But I figured I'd ask.
 
  My problem is that I want to display a random assortment of
  documents, but
  unfortunately certain types of documents far outnumber
  other types. So a
  random assortment ends up with 50% type A, 50% type B, C,
  D, E, F. So, I
  was thinking I would essentially boost types B, C, D, E,
  F until all types
  are approximately evenly represented in the random
  assortment. (Or
  alternatively, if the user has an affinity for type B
  documents, further
  boost type B documents so that they're more likely to be
  represented than
  other types).
 
  Anyone know if there's a way to do something like this in
  Solr?

 Sounds like you want to achieve diversity of results.

 Consider using http://wiki.apache.org/solr/FieldCollapsing

 Alternatively you can make use of RandomSortField with function queries.

 http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Unable to get multicore working

2011-08-16 Thread Alexei Martchenko
Lets try something simplier.
My start.jar is on \apache-solr-3.3.0\example\
Here's my local config placed in \apache-solr-3.3.0\example\solr\

?xml version=1.0 encoding=UTF-8 ?
solr persistent=true
cores adminPath=/admin/cores
core name=softwares01 instanceDir=softwares01 /
  /cores
/solr

Create \apache-solr-3.3.0\example\solr\softwares01\conf\
and \apache-solr-3.3.0\example\solr\softwares01\data\

http://localhost:8983/solr/ should work and so is
http://localhost:8983/solr/softwares01/admin/



2011/8/16 David Sauve dnsa...@gmail.com

 I've been trying (unsuccessfully) to get multicore working for about a day
 and a half now I'm nearly at wits end and unsure what to do anymore. **Any**
 help would be appreciated.

 I've installed Solr using the solr-jetty packages on Ubuntu 10.04. The
 default Solr install seems to work fine.

 Now, I want to add three cores: live, staging, preview to be used for the
 various states of the site.

 I've created a `solr.xml` file as follows and symlinked it in to
 /usr/share/solr:

 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=false
 cores adminPath=/admin/cores
 core name=preview instanceDir=/home/webteam/config/search/preview
 dataDir=/home/webteam/preview/data /
 core name=staging instanceDir=/home/webteam/config/search/staging
 dataDir=/home/webteam/staging/data /
 core name=live instanceDir=/home/webteam/config/search/live
 dataDir=/home/webteam/live/data /
 /cores
 /solr

 Now, when I try to view any cores, I get a 404 - Not found. In fact, I
 can't even view /solr/admin/ anymore after installing that `solr.xml` file.

 Also, /solr/admin/cores returns an XML file, but it looks to me like
 there's no cores listed. The output:

 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime0/int

 /lst


 lst name=status/

 /response


 Finally, looking through the logs produced by Jetty doesn't seem to reveal
 any clues about what is wrong. There doesn't seem to be any errors in there,
 except the 404s.

 Long story short. I'm stuck. Any suggestions on where to go with this?

 David




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Unable to get multicore working

2011-08-16 Thread Alexei Martchenko
AFAIK you're still seeing singlecore version

where is your start.jar?

search for solr.xml, see how many u've got plz.

2011/8/16 David Sauve dnsa...@gmail.com

  I've installed using aptitude so I don't have an example folder (that I
 can find).

 /solr/ does work (but lists no cores)
 /solr/live/admin/ does not -- 404


 On Tuesday, 16 August, 2011 at 1:13 PM, Alexei Martchenko wrote:

  Lets try something simplier.
  My start.jar is on \apache-solr-3.3.0\example\
  Here's my local config placed in \apache-solr-3.3.0\example\solr\
 
  ?xml version=1.0 encoding=UTF-8 ?
  solr persistent=true
  cores adminPath=/admin/cores
  core name=softwares01 instanceDir=softwares01 /
  /cores
  /solr
 
  Create \apache-solr-3.3.0\example\solr\softwares01\conf\
  and \apache-solr-3.3.0\example\solr\softwares01\data\
 
  http://localhost:8983/solr/ should work and so is
  http://localhost:8983/solr/softwares01/admin/
 
 
 
  2011/8/16 David Sauve dnsa...@gmail.com (mailto:dnsa...@gmail.com)
 
   I've been trying (unsuccessfully) to get multicore working for about a
 day
   and a half now I'm nearly at wits end and unsure what to do anymore.
 **Any**
   help would be appreciated.
  
   I've installed Solr using the solr-jetty packages on Ubuntu 10.04. The
   default Solr install seems to work fine.
  
   Now, I want to add three cores: live, staging, preview to be used for
 the
   various states of the site.
  
   I've created a `solr.xml` file as follows and symlinked it in to
   /usr/share/solr:
  
   ?xml version=1.0 encoding=UTF-8 ?
   solr persistent=false
   cores adminPath=/admin/cores
   core name=preview instanceDir=/home/webteam/config/search/preview
   dataDir=/home/webteam/preview/data /
   core name=staging instanceDir=/home/webteam/config/search/staging
   dataDir=/home/webteam/staging/data /
   core name=live instanceDir=/home/webteam/config/search/live
   dataDir=/home/webteam/live/data /
   /cores
   /solr
  
   Now, when I try to view any cores, I get a 404 - Not found. In fact, I
   can't even view /solr/admin/ anymore after installing that `solr.xml`
 file.
  
   Also, /solr/admin/cores returns an XML file, but it looks to me like
   there's no cores listed. The output:
  
   response
   lst name=responseHeader
   int name=status0/int
   int name=QTime0/int
  
   /lst
  
  
   lst name=status/
  
   /response
  
  
   Finally, looking through the logs produced by Jetty doesn't seem to
 reveal
   any clues about what is wrong. There doesn't seem to be any errors in
 there,
   except the 404s.
  
   Long story short. I'm stuck. Any suggestions on where to go with this?
  
   David
 
 
  --
 
  *Alexei Martchenko* | *CEO* | Superdownloads
  ale...@superdownloads.com.br (mailto:ale...@superdownloads.com.br) |
 ale...@martchenko.com.br (mailto:ale...@martchenko.com.br) | (11)
  5083.1018/5080.3535/5080.3533




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Unable to get multicore working

2011-08-16 Thread Alexei Martchenko
Is your solr.xml in usr/share/jetty/solr/solr.xml?

lets try this xml instead

?xml version=1.0 encoding=UTF-8 ?
solr persistent=true
cores adminPath=/admin/cores
core name=core01 instanceDir=core01 /
core name=core02 instanceDir=core02 /
core name=core03 instanceDir=core03 /
  /cores
/solr

Can you see the logs? You should see something like this

16/08/2011 17:30:55 org.apache.solr.core.SolrResourceLoader init
*INFO: Solr home set to 'solr/'*
16/08/2011 17:30:55 org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
16/08/2011 17:30:55 org.apache.solr.core.SolrResourceLoader locateSolrHome
INFO: JNDI not configured for solr (NoInitialContextEx)
16/08/2011 17:30:55 org.apache.solr.core.SolrResourceLoader locateSolrHome
*INFO: solr home defaulted to 'solr/' (could not find system property or
JNDI)*
16/08/2011 17:30:55 org.apache.solr.core.CoreContainer$Initializer
initialize
*INFO: looking for solr.xml: usr/share/jetty/solr/solr.xml*
16/08/2011 17:30:55 org.apache.solr.core.SolrResourceLoader locateSolrHome
INFO: JNDI not configured for solr (NoInitialContextEx)
16/08/2011 17:30:55 org.apache.solr.core.SolrResourceLoader locateSolrHome
*INFO: solr home defaulted to 'solr/' (could not find system property or
JNDI)*
16/08/2011 17:30:55 org.apache.solr.core.CoreContainer init
*INFO: New CoreContainer: solrHome=solr/ instance=21357269*
16/08/2011 17:30:55 org.apache.solr.core.SolrResourceLoader init
*INFO: Solr home set to 'solr/'*
16/08/2011 17:30:55 org.apache.solr.core.SolrResourceLoader init
*INFO: Solr home set to 'solr\core01\'*

2011/8/16 David Sauve dnsa...@gmail.com

 Just the one `solr.xml`. The one I added (well, symlinked form my config
 folder -- I like to keep my configurations files organized so they can be
 managed by git)

 `start.jar` is in `usr/share/jetty/start.jar`.


 On Tuesday, 16 August, 2011 at 1:33 PM, Alexei Martchenko wrote:

  AFAIK you're still seeing singlecore version
 
  where is your start.jar?
 
  search for solr.xml, see how many u've got plz.
 
  2011/8/16 David Sauve dnsa...@gmail.com (mailto:dnsa...@gmail.com)
 
I've installed using aptitude so I don't have an example folder (that
 I
   can find).
  
   /solr/ does work (but lists no cores)
   /solr/live/admin/ does not -- 404
  
  
   On Tuesday, 16 August, 2011 at 1:13 PM, Alexei Martchenko wrote:
  
Lets try something simplier.
My start.jar is on \apache-solr-3.3.0\example\
Here's my local config placed in \apache-solr-3.3.0\example\solr\
   
?xml version=1.0 encoding=UTF-8 ?
solr persistent=true
cores adminPath=/admin/cores
core name=softwares01 instanceDir=softwares01 /
/cores
/solr
   
Create \apache-solr-3.3.0\example\solr\softwares01\conf\
and \apache-solr-3.3.0\example\solr\softwares01\data\
   
http://localhost:8983/solr/ should work and so is
http://localhost:8983/solr/softwares01/admin/
   
   
   
2011/8/16 David Sauve dnsa...@gmail.com (mailto:dnsa...@gmail.com)
   
 I've been trying (unsuccessfully) to get multicore working for
 about a
   day
 and a half now I'm nearly at wits end and unsure what to do
 anymore.
   **Any**
 help would be appreciated.

 I've installed Solr using the solr-jetty packages on Ubuntu 10.04.
 The
 default Solr install seems to work fine.

 Now, I want to add three cores: live, staging, preview to be used
 for
   the
 various states of the site.

 I've created a `solr.xml` file as follows and symlinked it in to
 /usr/share/solr:

 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=false
 cores adminPath=/admin/cores
 core name=preview
 instanceDir=/home/webteam/config/search/preview
 dataDir=/home/webteam/preview/data /
 core name=staging
 instanceDir=/home/webteam/config/search/staging
 dataDir=/home/webteam/staging/data /
 core name=live instanceDir=/home/webteam/config/search/live
 dataDir=/home/webteam/live/data /
 /cores
 /solr

 Now, when I try to view any cores, I get a 404 - Not found. In
 fact, I
 can't even view /solr/admin/ anymore after installing that
 `solr.xml`
   file.

 Also, /solr/admin/cores returns an XML file, but it looks to me
 like
 there's no cores listed. The output:

 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime0/int

 /lst


 lst name=status/

 /response


 Finally, looking through the logs produced by Jetty doesn't seem to
   reveal
 any clues about what is wrong. There doesn't seem to be any errors
 in
   there,
 except the 404s.

 Long story short. I'm stuck. Any suggestions on where to go with
 this?

 David
   
   
--
   
*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br (mailto:ale...@superdownloads.com.br) |
   ale...@martchenko.com.br (mailto:ale...@martchenko.com.br) | (11)
5083.1018

Re: Migration from Autonomy IDOL to SOLR

2011-08-15 Thread Alexei Martchenko
This might be a longshot but... Adobe is deprecating Verity in Coldfusion
engine. Version 9 has both databases but I believe CF10 will only have Solr
bundled. Idol is the-new-verity since autonomy acquired verity. Although
Adobe wraps solr to work like old verity, there might be some info on people
who migrated from verity from solr few years ago.

Sorry for not helping much but sometimes these little information leads to
something.

2011/8/15 Arcadius Ahouansou arcad...@menelic.com

 Hello.

 We have a couple of application running on half a dozen Autonomy IDOL
 servers.
 Currently, all feature we need are supported by Solr.

 We have done some internal testing and realized that SOLR would do a better
 job.

 So, we are investigation all possibilities for a smooth migration from IDOL
 to SOLR.

 I am looking for advice from people who went through something similar.

 Ideally, we would like to keep most of our legacy code unchanged and have a
 kind of query-translation-layer plugged into our app if possible.

 -Is there lib available?

 -Any thought?

 Thanks.

 Arcadius.




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


hl.useFastVectorHighlighter, fragmentsBuilder and HighlightingParameters

2011-08-15 Thread Alexei Martchenko
I'm having some trouble trying to upgrade my old hightligher
from highlightingfragmenterformatter format (1.4 version, default
config in the solr website) to the new Fast Vector highlighter.

I'm using SOLR 3.3.0 with luceneMatchVersionLUCENE_33/luceneMatchVersion
in config

In my solrconfig.xml i added these lines

in the default request handler:

bool name=hl.useFastVectorHighlightertrue/bool
bool name=hl.usePhraseHighlightertrue/bool
bool name=hl.highlightMultiTermtrue/bool
str name=hl.fragmentsBuildercolored/str

and

fragmentsBuilder name=colored
class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder
  lst name=defaults
str name=hl.tag.pre![CDATA[
 b style=background:yellow,b style=background:lawgreen,
 b style=background:aquamarine,b style=background:magenta,
 b style=background:palegreen,b style=background:coral,
 b style=background:wheat,b style=background:khaki,
 b style=background:lime,b
style=background:deepskyblue]]/str
str name=hl.tag.post![CDATA[/b]]/str
  /lst
/fragmentsBuilder

All I get is: ('grave' means severe)

15/08/2011 20:44:19 org.apache.solr.common.SolrException log
GRAVE: org.apache.solr.common.SolrException: Unknown fragmentsBuilder:
colored
at
org.apache.solr.highlight.DefaultSolrHighlighter.getSolrFragmentsBuilder(DefaultSolrHighlighter.java:320)
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:508)

at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:376)
at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:116)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

Docs in http://wiki.apache.org/solr/HighlightingParameters say:

hl.fragmentsBuilder

Specify the name of
SolrFragmentsBuilderhttp://wiki.apache.org/solr/SolrFragmentsBuilder
. [image: !] Solr3.1 http://wiki.apache.org/solr/Solr3.1 This parameter
makes sense for
FastVectorHighlighterhttp://wiki.apache.org/solr/FastVectorHighlighter
 only.

SolrFragmentsBuilder
http://wiki.apache.org/solr/SolrFragmentsBuilder respects
hl.tag.pre/post parameters:

!-- multi-colored tag FragmentsBuilder --
fragmentsBuilder name=colored
class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder
  lst name=defaults
str name=hl.tag.pre![CDATA[
 b style=background:yellow,b style=background:lawgreen,
 b style=background:aquamarine,b style=background:magenta,
 b style=background:palegreen,b style=background:coral,
 b style=background:wheat,b style=background:khaki,
 b style=background:lime,b style=background:deepskyblue]]/str
str name=hl.tag.post![CDATA[/b]]/str
  /lst
/fragmentsBuilder


-- 

*Alexei*


Re: strip html from data

2011-08-11 Thread Alexei Martchenko
. There are stillh3
   tags
   inside the data. Although I believe there are viewer then before
  but I
   can not prove that. Fact is, there are still html tags inside the
   data.
  
   Any other ideas what the problem could be?
  
  
  
  
  
   2011/7/25 Markus Jelsmamarkus.jelsma@**openindex.io
  markus.jel...@openindex.io
   
  
  
  
   You've three analyzer elements, i wonder what that would do. You
  need
   to add
   the char filter to the index-time analyzer.
  
   On Monday 25 July 2011 13:09:14 Merlin Morgenstern wrote:
  
  
   Hi there,
  
   I am trying to strip html tags from the data before adding the
   documents
  
  
   to
  
  
  
   the index. To do that I altered schem.xml like this:
  fieldType name=text class=solr.TextField
  
   positionIncrementGap=100 autoGeneratePhraseQueries=**true
  
 analyzer type=index
  
 tokenizer
  
 class=solr.**WhitespaceTokenizerFactory/
 filter
 class=solr.**WordDelimiterFilterFactory
  
   generateWordParts=1 generateNumberParts=1 catenateWords=1
   catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
  
 filter class=solr.**
   LowerCaseFilterFactory/
 filter
  
 class=solr.**KeywordMarkerFilterFactory/
 filter class=solr.**
   PorterStemFilterFactory/
  
 /analyzer
 analyzer type=query
  
 tokenizer
  
 class=solr.**WhitespaceTokenizerFactory/
 filter
 class=solr.**WordDelimiterFilterFactory
  
   generateWordParts=1 generateNumberParts=1 catenateWords=0
   catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
  
 filter class=solr.**
   LowerCaseFilterFactory/
 filter
  
 class=solr.**KeywordMarkerFilterFactory/
 filter class=solr.**
   PorterStemFilterFactory/
  
 /analyzer
 analyzer
  
 charFilter
  
 class=solr.**HTMLStripCharFilterFactory/
  
  tokenizer
  
   class=solr.**WhitespaceTokenizerFactory/
  
 /analyzer
  
  /fieldType
  
 fields
  
 field name=text type=text indexed=true
   stored=true
  
   required=false/
  
 /fields
  
   Unfortunatelly this does not work, the hmtl tags likeh3
  are
   still
   present after restarting and reindexing. I also tryed
   htmlstriptransformer, but this did not work either.
  
   Has anybody an idea how to get this done? Thank you in advance
  for
   any hint.
  
   Merlin
  
  
   --
   Markus Jelsma - CTO - Openindex
   http://www.linkedin.com/in/**markus17
  http://www.linkedin.com/in/markus17
   050-8536620 / 06-50258350
  
  
  
  
  
  
 




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: how to ignore case in solr search field?

2011-08-11 Thread Alexei Martchenko
Here's an example. Since I only query this for spelling, i can lowecase both
on index and query time.

fieldType name=textSpell class=solr.TextField positionIncrementGap=10
stored=false multiValued=true
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
charFilter class=solr.HTMLStripCharFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.StandardFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.StandardFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

2011/8/10 nagarjuna nagarjuna.avul...@gmail.com

 Hi please help me ..
how to ignore case while searching in solr


 ex:i need same results for the keywords abc, ABC , aBc,AbC and all the
 cases.




 Thank u in advance

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/how-to-ignore-case-in-solr-search-field-tp3242967p3242967.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-11 Thread Alexei Martchenko
are you boosting your docs?

2011/8/8 Jason Toy jason...@gmail.com

 I am trying to test out and compare different sorts and scoring.

  When I use dismax to search for indie music
 with: qf=all_lists_textq=indie+musicdefType=dismaxrows=100
 I see some stuff that seems irrelevant, meaning in top results I see only
 1 or 2 mentions of indie music, but when I look further down the list I
 do
 see other docs that have more occurrences of indie music.
 So I a want to test by comparing the the different queries versus seeing a
 list of docs ranked specifically by the count of occurrences of the phrase
 indie music

 On Mon, Aug 8, 2011 at 2:19 PM, Markus Jelsma markus.jel...@openindex.io
 wrote:

 
   Dismax queries can. But
  
   sort=termfreq(all_lists_text,'indie+music')
  
   is not using dismax.  Apparenty termfreq function can not? I am not
   familiar with the termfreq function.
 
  It simply returns the TF of the given _term_  as it is indexed of the
  current
  document.
 
  Sorting on TF like this seems strange as by default queries are already
  sorted
  that way since TF plays a big role in the final score.
 
  
   To understand why you'd need to reindex, you might want to read up on
 how
   lucene actually works, to get a basic understanding of how different
   indexing choices effect what is possible at query time. Lucene In
 Action
   is a pretty good book.
  
   On 8/8/2011 5:02 PM, Jason Toy wrote:
Are not  Dismax queries able to search for phrases using the default
index(which is what I am using?) If I can already do phrase
  searches,
  I
don't understand why I would need to reindex t be able to access
  phrases
from a function.
   
On Mon, Aug 8, 2011 at 1:49 PM, Markus
  Jelsmamarkus.jel...@openindex.iowrote:
Aelexei, thank you , that does seem to work.
   
My sort results seem to be totally wrong though, I'm not sure if
 its
because of my sort function or something else.
   
My query consists of:
sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100
And I get back 4571232 hits.
   
That's normal, you issue a catch all query. Sorting should work
 but..
   
All the results don't have the phrase indie music anywhere in
 their
   
data.
   
  Does termfreq not support phrases?
   
No, it is TERM frequency and indie music is not one term. I don't
 know
how this function parses your input but it might not understand your
 +
escape and
think it's one term constisting of exactly that.
   
If not, how can I sort specifically by termfreq of a phrase?
   
You cannot. What you can do is index multiple terms as one term
 using
the shingle filter. Take care, it can significantly increase your
  index
size and
number of unique terms.
   
On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko
   
ale...@superdownloads.com.br  wrote:
You can use the standard query parser and pass q=*:*
   
2011/8/8 Jason Toyjason...@gmail.com
   
I am trying to list some data based on a function I run ,
specifically  termfreq(post_text,'indie music')  and I am unable
 to
   
do
   
it without passing in data to the q paramater.  Is it possible to
  get
a
   
sorted
   
list without searching for any terms?
   
--
   
*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533
 



 --
 - sent from my mobile
 6176064373




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: is it possible to do a sort without query?

2011-08-08 Thread Alexei Martchenko
You can use the standard query parser and pass q=*:*

2011/8/8 Jason Toy jason...@gmail.com

 I am trying to list some data based on a function I run ,
 specifically  termfreq(post_text,'indie music')  and I am unable to do it
 without passing in data to the q paramater.  Is it possible to get a sorted
 list without searching for any terms?




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: German language specific problem (automatic Spelling correction, automatic Synonyms ?)

2011-08-01 Thread Alexei Martchenko
I'd try solr.PhoneticFilterFactory, it usually converts these slight
differences... schmidt, smith and schmid will be something like XMDT

2011/8/1 thomas tom.erfu...@googlemail.com

 Hi,
 we have several entries in our database our customer would like to find
 when
 using a not exactly matching search string. The Problem is kind of related
 to spelling correction and synonyms. But instead of single entries in
 synonyms.txt we would like a automatic solution for this group of problems:

 When searching for the name: schmid we want to find also documents with
 the name schmidt included. There are analog names like hildebrand and
 hildebrandt and more. That is the reason we'd like to find a automatic
 solution for this group of words.

 We allready use the following filters in our index chain
 filter class=solr.DictionaryCompoundWordTokenFilterFactory
 dictionary=dictionary_de.txt/
 filter class=solr.SnowballPorterFilterFactory language=German2
 protected=protwords.txt/

 Unfortunatelly the german stemmer is not handling such problems. Nor is
 this
 a problem related to compound words.

 Does anyone know of a solution? maybe its possible to set up a filter rule
 to extend words ending with letter d automatically with letter t in the
 query chain? Or other direction to remove t letters after d letters in
 index chain.

 Thanks a lot
 Thomas

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/German-language-specific-problem-automatic-Spelling-correction-automatic-Synonyms-tp3216278p3216278.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Query on multi valued field

2011-07-31 Thread Alexei Martchenko
have you tried multi:1 and multi:2 and multi:3 ?

2011/7/29 rajini maski rajinima...@gmail.com

 Hi All,

   I have a specific requirement in the multi-valued field type.The
 requirement is as follows

 There is a multivalued field in each document which can have mutliple
 elements or single element.

 For Eg: Consider that following are the documents matched for say q= *:*

 *DOC1*

  doc
 arr name=multi
 str1/str
 /arr
 /doc
 *
 *
 *DOC2*
 doc
 arr name=multi
 str1/str
 str3/str
 str4/str
 /arr
 /doc

 *DOC3*
 doc
 arr name=multi
 str1/str
 str2/str
 /arr
 /doc

The query is get only those documents which have multiple elements for
 that multivalued field.

 I.e, doc 2 and 3  should be returned from the above set..

 Is there anyway to achieve this?


 Awaiting reply,

 Thanks  Regards,
 Rajani




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Solr Incremental Indexing

2011-07-30 Thread Alexei Martchenko
I always have a field in my databases called datelastmodified, so whenever I
update that record, i set it to getdate() - mssql func - and then get all
latest records order by that field.

2011/7/29 Mohammed Lateef Hussain mohammedlateefh...@gmail.com

 Hi

 Need some help in Solr incremental indexing approch.

 I have built my Solr index using SolrJ API and now want to update the index
 whenever any changes has been made in
 database. My requirement is not to use DB triggers to call any update
 events.

 I want to update my index on the fly whenever my application updates any
 record in database.

 Note: My indexing logic to get the required data from DB is some what
 complex and involves many tables.

 Please suggest me how can I proceed here.

 Thanks
 Lateef




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: schema.xml changes, need re-indexing ?

2011-07-27 Thread Alexei Martchenko
I believe you're fine with that. Don't need to reindex all solr database.

2011/7/27 Charles-Andre Martin charles-andre.mar...@sunmedia.ca

 Hi,



 We currently have a big index in production. We would like to add 2
 non-required fields to our schema.xml :



 field name=myfield type=boolean indexed=true stored=true
 required=false/

 field name=myotherfield type=string indexed=true stored=true
 required=false multiValued=true/



 I made some tests:



 -  I stopped tomcat

 -  I changed the schema.xml

 -  I started tomcat



 The data was still there and I was able to add new document with theses 2
 fields.



 So far, it looks I won't need to re-index all my data. Am I right ? Do I
 need to re-index all my data or in that case I'm fine ?



 Thank you !



 Charles-André Martin




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: schema.xml changes, need re-indexing ?

2011-07-27 Thread Alexei Martchenko
I always run
http://localhost:8983/solr/admin/cores?action=RELOADcore=corename in the
browser when I wanna reload solr and see any changes in config xmls.

2011/7/27 François Schiettecatte fschietteca...@gmail.com

 I have not seen this mentioned anywhere, but I found a useful 'trick' to
 restart solr without having to restart tomcat. All you need to do is 'touch'
 the solr.xml in the solr.home directory. It can take a few seconds but solr
 will restart and reload any config.

 Cheers

 François

 On Jul 27, 2011, at 2:56 PM, Alexei Martchenko wrote:

  I believe you're fine with that. Don't need to reindex all solr database.
 
  2011/7/27 Charles-Andre Martin charles-andre.mar...@sunmedia.ca
 
  Hi,
 
 
 
  We currently have a big index in production. We would like to add 2
  non-required fields to our schema.xml :
 
 
 
  field name=myfield type=boolean indexed=true stored=true
  required=false/
 
  field name=myotherfield type=string indexed=true stored=true
  required=false multiValued=true/
 
 
 
  I made some tests:
 
 
 
  -  I stopped tomcat
 
  -  I changed the schema.xml
 
  -  I started tomcat
 
 
 
  The data was still there and I was able to add new document with theses
 2
  fields.
 
 
 
  So far, it looks I won't need to re-index all my data. Am I right ? Do I
  need to re-index all my data or in that case I'm fine ?
 
 
 
  Thank you !
 
 
 
  Charles-André Martin
 
 
 
 
  --
 
  *Alexei Martchenko* | *CEO* | Superdownloads
  ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
  5083.1018/5080.3535/5080.3533