Re: If you could have one feature in Solr...

2010-04-07 Thread Ingo Renner

Am 24.02.2010 um 14:42 schrieb Grant Ingersoll:

 What would it be?

Remote administration/editing/filling of synonyms.txt, stopwords.txt, ... 
through a request handler, maybe a JSON interface or similar


best
Ingo

-- 
Ingo Renner
TYPO3 Core Developer, Release Manager TYPO3 4.2, Admin Google Summer of Code

Apache Solr for TYPO3: http://www.typo3-solr.com



Re: If you could have one feature in Solr...

2010-04-07 Thread Ingo Renner

Am 25.02.2010 um 02:07 schrieb Andy:

 1) Built-in hierarchical faceting
 Right now there're 2 patches, SOLR-64 and SOLR-792. SOLR-64 seems to be 
 slated for 1.5 release but according to the wiki seems to have poor 
 performance. SOLR-792 has better performance according to the wiki but it's 
 unclear if it'll ever be part of the Solr distribution. Not sure what are the 
 pros  cons of the 2 patches. Hierarchical faceting is very common and it'd 
 be nice to have this as a standard feature
 
 2) Near real-time update  search
 
 3) Partial update - something like LUCENE-1879
 
 4) Automatic language detection  stemmer selection
 I have document collections that are a mix of different languages. It'd be 
 great to be able to automatically detect what language a specific document is 
 in and select an appropriate stemmer for that language

+1 on all of these


Ingo

-- 
Ingo Renner
TYPO3 Core Developer, Release Manager TYPO3 4.2, Admin Google Summer of Code

Apache Solr for TYPO3: http://www.typo3-solr.com



Re: If you could have one feature in Solr...

2010-03-25 Thread Jacob Elder
   1. Real time or near-real time updates.
   2. First-class spatial search.

On Wed, Feb 24, 2010 at 9:42 AM, Grant Ingersoll gsing...@apache.orgwrote:

 What would it be?




-- 
Jacob Elder


Re: If you could have one feature in Solr...

2010-03-24 Thread Teruhiko Kurosaka
(Sorry for very late response on this topic.)

On Feb 28, 2010, at 5:47 AM, Adrien Specq wrote:

 - langage attribute for each field

I was thinking about it and it was one of my wishes.
Currently, Solr practically requires that we have
a field for each natural language that an application
supports.  If the app needs to support English, French and
German, we would have to have title_en, title_fr, and title_de
(suffixes are ISO 2-letter lang codes) instead of just 
a title field.  This isn't pretty.  

What if we want to support 15 languages?  It would be much 
better if we can have just one title field and language 
information associated with the value.  

But after I thought about it a bit deeper, I think the
current ugly solution is actually practical.  This is because 
most users want to find documents of the languages they 
understand.  So if a user indicate they understand English and 
German only, we just need to search title_en and title_de.

Maybe I'm missing something...


Teruhiko Kuro Kurosaka, 415-227-9600 x122
RLP + Lucene  Solr = powerful search for global contents



Re: If you could have one feature in Solr...

2010-03-24 Thread Dennis Gearon
Most databases only RECENTLY have set up langauges per column. Languages per 
ENTRY in a column? I don't think any support that yet. How would you get that 
information from a database with the corresponding language attribute?


Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 3/24/10, Teruhiko Kurosaka k...@basistech.com wrote:

 From: Teruhiko Kurosaka k...@basistech.com
 Subject: Re: If you could have one feature in Solr...
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Wednesday, March 24, 2010, 11:36 AM
 (Sorry for very late response on this
 topic.)
 
 On Feb 28, 2010, at 5:47 AM, Adrien Specq wrote:
 
  - langage attribute for each field
 
 I was thinking about it and it was one of my wishes.
 Currently, Solr practically requires that we have
 a field for each natural language that an application
 supports.  If the app needs to support English, French
 and
 German, we would have to have title_en, title_fr, and
 title_de
 (suffixes are ISO 2-letter lang codes) instead of just 
 a title field.  This isn't pretty.  
 
 What if we want to support 15 languages?  It would be
 much 
 better if we can have just one title field and language 
 information associated with the value.  
 
 But after I thought about it a bit deeper, I think the
 current ugly solution is actually practical.  This is
 because 
 most users want to find documents of the languages they 
 understand.  So if a user indicate they understand
 English and 
 German only, we just need to search title_en and title_de.
 
 Maybe I'm missing something...
 
 
 Teruhiko Kuro Kurosaka, 415-227-9600 x122
 RLP + Lucene  Solr = powerful search for global
 contents
 



Re: If you could have one feature in Solr...

2010-03-24 Thread Teruhiko Kurosaka
First of all, I am not really concerned with per field
(or per-column in DB term) portion of the original request.
Most documents are monolingual.

How languages are identified depends on your application,
and database support of language tagging is not necessary.

The database schema designer may have created a field that 
stores the language information, for example.

If you are indexing documents that live in a file system,
the directory hierarchy or the name of the documents might
tell the language, assuming you have set up some standard
naming convention.

HTML documents may have the META tag for Content-Language.  
If it is from an HTTP feed, there may be Content-Language header.

And if all else fails, or the information is not reliable, the language 
can be determined by analyzing the document statistically by software
such as Nutch's Language Identifier, or commercial language identifier
software like my employer, Basis Technology, sells.

 Most databases only RECENTLY have set up langauges per column. Languages per 
 ENTRY in a column? I don't think any support that yet. How would you get that 
 information from a database with the corresponding language attribute?
 
 
 Dennis Gearon
 
 Signature Warning
 
 EARTH has a Right To Life,
  otherwise we all die.
 
 Read 'Hot, Flat, and Crowded'
 Laugh at http://www.yert.com/film.php
 
 
 --- On Wed, 3/24/10, Teruhiko Kurosaka k...@basistech.com wrote:
 
 From: Teruhiko Kurosaka k...@basistech.com
 Subject: Re: If you could have one feature in Solr...
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Wednesday, March 24, 2010, 11:36 AM
 (Sorry for very late response on this
 topic.)
 
 On Feb 28, 2010, at 5:47 AM, Adrien Specq wrote:
 
 - langage attribute for each field
 
 I was thinking about it and it was one of my wishes.
 Currently, Solr practically requires that we have
 a field for each natural language that an application
 supports.  If the app needs to support English, French
 and
 German, we would have to have title_en, title_fr, and
 title_de
 (suffixes are ISO 2-letter lang codes) instead of just 
 a title field.  This isn't pretty.  
 
 What if we want to support 15 languages?  It would be
 much 
 better if we can have just one title field and language 
 information associated with the value.  
 
 But after I thought about it a bit deeper, I think the
 current ugly solution is actually practical.  This is
 because 
 most users want to find documents of the languages they 
 understand.  So if a user indicate they understand
 English and 
 German only, we just need to search title_en and title_de.
 
 Maybe I'm missing something...
 
 
 Teruhiko Kuro Kurosaka, 415-227-9600 x122
 RLP + Lucene  Solr = powerful search for global
 contents
 
 


Teruhiko Kuro Kurosaka, 415-227-9600 x122
RLP + Lucene  Solr = powerful search for global contents


Re: If you could have one feature in Solr...

2010-03-06 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Fri, Mar 5, 2010 at 4:34 AM, Mark Miller markrmil...@gmail.com wrote:
 On 03/04/2010 05:56 PM, Chris Hostetter wrote:

 : The ability to read solr configuration files from the classpath instead
 of
 : solr.solr.home directory.

 Solr has always supported this.

 When SolrResourceLoader.openResourceLoader is asked to open a resource it
 first checks if it's an absolute path -- if it's not then it checks
 relative the conf dir (under whatever the instanceDir is, ie: Solr Home
 in a single core setup), then it checks relative the current working dir
 and if it still can't find it it checks via the current ClassLoader.

 that said: it's not something that a lot of people have ever taken
 advantage of, so it wouldn't suprise me if some features in Solr are
 buggy because they try to open files directly w/o utilizing
 openResourceLoader -- in particular a quick test of the trunk example
 using...
 java -Djetty.class.path=./solr/conf -Dsolr.solr.home=/tmp/new-solr-home
 -jar start.jar

 ...seems to suggest that QueryElevationComponent isn't using openResource
 to look for elevate.xml  (i set solr.solr.home in that line so solr would
 *NOT* attempt to look at ./solr ... it does need some sort of Solr Home,
 but in this case it was a completley empty directory)


 -Hoss



 I've been trying to think of ways to tackle this. I hate getConfigDir - it
 lets anyone just get around the ResourceLoader basically.

 It would be awesome to get rid of it somehow - it would make
 ZooKeeperSolrResourceLoader so much easier to get working correctly across
 the board.
Why not just get rid of it? Components depending on filesystems is a
big headache.

 The main thing I'm hung up on is how to update a file - some code I've seen
 uses getConfigDir to update files eg you get the content of solrconfig, then
 you want to update it and reload the core. Most other things, I think are
 doable without getConfigDir.

 QueryElevationComponent is actually sort of simple to get around - we just
 need to add an exists method that return true/false if the resource exists.
 QEC just uses getConfigDir to a do an exists on the elevate.xml - if its not
 there, it looks in the data dir.

 --
 - Mark

 http://www.lucidimagination.com







-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com


Re: If you could have one feature in Solr...

2010-03-04 Thread Chris Hostetter
: The ability to read solr configuration files from the classpath instead of
: solr.solr.home directory.

Solr has always supported this.  

When SolrResourceLoader.openResourceLoader is asked to open a resource it 
first checks if it's an absolute path -- if it's not then it checks 
relative the conf dir (under whatever the instanceDir is, ie: Solr Home 
in a single core setup), then it checks relative the current working dir 
and if it still can't find it it checks via the current ClassLoader.

that said: it's not something that a lot of people have ever taken 
advantage of, so it wouldn't suprise me if some features in Solr are 
buggy because they try to open files directly w/o utilizing 
openResourceLoader -- in particular a quick test of the trunk example 
using...
java -Djetty.class.path=./solr/conf -Dsolr.solr.home=/tmp/new-solr-home -jar 
start.jar

...seems to suggest that QueryElevationComponent isn't using openResource 
to look for elevate.xml  (i set solr.solr.home in that line so solr would 
*NOT* attempt to look at ./solr ... it does need some sort of Solr Home, 
but in this case it was a completley empty directory)


-Hoss



Re: If you could have one feature in Solr...

2010-03-04 Thread Mark Miller

On 03/04/2010 05:56 PM, Chris Hostetter wrote:

: The ability to read solr configuration files from the classpath instead of
: solr.solr.home directory.

Solr has always supported this.

When SolrResourceLoader.openResourceLoader is asked to open a resource it
first checks if it's an absolute path -- if it's not then it checks
relative the conf dir (under whatever the instanceDir is, ie: Solr Home
in a single core setup), then it checks relative the current working dir
and if it still can't find it it checks via the current ClassLoader.

that said: it's not something that a lot of people have ever taken
advantage of, so it wouldn't suprise me if some features in Solr are
buggy because they try to open files directly w/o utilizing
openResourceLoader -- in particular a quick test of the trunk example
using...
java -Djetty.class.path=./solr/conf -Dsolr.solr.home=/tmp/new-solr-home -jar 
start.jar

...seems to suggest that QueryElevationComponent isn't using openResource
to look for elevate.xml  (i set solr.solr.home in that line so solr would
*NOT* attempt to look at ./solr ... it does need some sort of Solr Home,
but in this case it was a completley empty directory)


-Hoss

   


I've been trying to think of ways to tackle this. I hate getConfigDir - 
it lets anyone just get around the ResourceLoader basically.


It would be awesome to get rid of it somehow - it would make 
ZooKeeperSolrResourceLoader so much easier to get working correctly 
across the board.


The main thing I'm hung up on is how to update a file - some code I've 
seen uses getConfigDir to update files eg you get the content of 
solrconfig, then
you want to update it and reload the core. Most other things, I think 
are doable without getConfigDir.


QueryElevationComponent is actually sort of simple to get around - we 
just need to add an exists method that return true/false if the resource 
exists.
QEC just uses getConfigDir to a do an exists on the elevate.xml - if its 
not there, it looks in the data dir.


--
- Mark

http://www.lucidimagination.com





Re: If you could have one feature in Solr...

2010-02-28 Thread Adrien Specq
 - Built-in hierarchical faceting
and
 - langage attribute for each field

On Sat, Feb 27, 2010 at 9:59 PM, Stephen Weiss swe...@stylesight.comwrote:

 I think an examples page would be a good idea.  We've already implemented
 search in Chinese, Japanese, and Spanish back with 1.3, but it was not
 really very well laid out how it was supposed to work - I had to dig through
 bits and pieces of people's configs left in the mailing list archives - and
 to be honest, I've never been 100% positive that we did it the right way.
  On the other hand, that it was possible was pretty obvious to me from
 reading the documentation (it was all in the API docs), it was just *how* to
 implement it that wasn't very clear for a non-java/lucene programmer like
 myself.

 --
 Steve


 On Feb 25, 2010, at 1:06 PM, Robert Muir wrote:

  Yeah, Thai and Arabic have the stuff in Solr 1.4
 For Chinese, if you want to do CJK bigram indexing, this is there too.
 If you want to do word-based smart indexing, you need to add an
 additional
 jar file to your classpath.

 we can add a wiki page with examples of how to use these maybe to make it
 easier?

 we could also add notes to new ones in lucene (hindi, czech, bulgarian,
 etc), as it might be easier to copy some code around and get them working
 with solr 1.4 than to write your own!

 separately, would you be interesting in helping with Bengali and Marathi?





Re: If you could have one feature in Solr...

2010-02-28 Thread Ian Holsman

On 2/24/10 8:42 AM, Grant Ingersoll wrote:

What would it be?

   

most of this will be coming in 1.5,
but for me it's

- sharding.. it still seems a bit clunky

secondly.. this one isn't in 1.5.
I'd like to be able to find interesting terms that appear in my result 
set that don't appear in the global corpus.


it's kind of like doing a facet count on *:* and then on the search term 
and discount the terms that appear heavily on the global one.
(sorry.. there is a textbook definition of this.. XX distance.. but I 
haven't got the books in front of me).








Re: If you could have one feature in Solr...

2010-02-28 Thread Andrzej Bialecki

On 2010-02-28 17:26, Ian Holsman wrote:

On 2/24/10 8:42 AM, Grant Ingersoll wrote:

What would it be?


most of this will be coming in 1.5,
but for me it's

- sharding.. it still seems a bit clunky

secondly.. this one isn't in 1.5.
I'd like to be able to find interesting terms that appear in my result
set that don't appear in the global corpus.

it's kind of like doing a facet count on *:* and then on the search term
and discount the terms that appear heavily on the global one.
(sorry.. there is a textbook definition of this.. XX distance.. but I
haven't got the books in front of me).


Kullback-Leibler divergence?


--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: If you could have one feature in Solr...

2010-02-28 Thread Erik Hatcher


On Feb 28, 2010, at 8:47 AM, Adrien Specq wrote:

- Built-in hierarchical faceting


Adrien - I'm curious what you mean by this exactly.  Could you  
describe your hierarchical faceting needs by example?Often  
hierarchical faceting can be accomplished by simply indexing /level1/ 
level2/... type string fields, but there certainly are other ways to  
go about it as well.


Thanks,
Erik



Re: If you could have one feature in Solr...

2010-02-28 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Feb 24, 2010 at 7:18 PM, Patrick Sauts patrick.via...@gmail.com wrote:
 Synchronisation between the slaves to switch the new index at the same time
 after replication.

I shall open as issue for this. And let us figure out how best it should be done
https://issues.apache.org/jira/browse/SOLR-1800


Re: If you could have one feature in Solr...

2010-02-27 Thread Grant Ingersoll

On Feb 26, 2010, at 11:28 PM, Dave Searle wrote:

 To have a coffee waiting for me every morning when I wake up. Marriage  
 material indeed. 


Dave,

Didn't you know that one already exists?

http://localhost:8983/solr/admin/coffeehandler?type=ethiopiancream=falsesugar=truetogo=true

:-)

-Grant

Re: If you could have one feature in Solr...

2010-02-26 Thread Don Werve

Realtime search, hands down.


Re: If you could have one feature in Solr...

2010-02-26 Thread Stephen Weiss

+1

I have several projects backburnered in the hope realtime search will  
come to solr soon...


[m]

On Feb 26, 2010, at 8:37 PM, Don Werve d...@madwombat.com wrote:


Realtime search, hands down.


RE: If you could have one feature in Solr...

2010-02-26 Thread Stuart Yeates
The indexer looking for an xml:lang attribute on text fields and using the 
value to pick, tokeniser, dictionaries, etc, etc automatically (and knowing to 
look for them in the standard places).

cheers
stuart

Re: If you could have one feature in Solr...

2010-02-26 Thread Dave Searle
To have a coffee waiting for me every morning when I wake up. Marriage  
material indeed. 


Re: If you could have one feature in Solr...

2010-02-25 Thread Stefano Cherchi
Grant, I'm not a java developer but a sysadmin and I've been struggling for a 
couple of month now to build a full web search engine stack based on hadoop + 
nutch + solr .

I don't know much about the documentation for developers so I trust you if you 
say it's good. 

What I do know is that I found good docs for the very first steps (installing 
and performing simple single-run crawling and indexing with near-default 
configurations) but I'm now facing a great lack of information about features 
useful in production scenarios. 

Now I need to dig deeper into how data are managed, into the workflow and all 
the features that are needed in real world and i find that the documentation is 
little, rather confused, incomplete, often quite old and spreaded into too many 
disconnected pieces. Stumbling around on the Net I discovered I'm not alone, 
actually.

I'm experiencing many difficulties in understanding and implementing even quite 
basic features such as consistent incremental recrawling/reindexing, adding 
custom fields, data parsing, duplicate detection, automatic removal of old 
indexed documents based on insertion date and so on. 

I mean, I would like a more organic set of use cases suited for real-world 
scenarios (as starting points) and some in-depth explanation of exactly how the 
data flow from the crawler into the complex structure of Solr and how it is 
handled by the different components of the stack. (Nutch documentation is 
probably even worse, but this is not the right place to complain about that).

S

-- 
Anyone proposing to run Windows on servers should be prepared to explain 
what they know about servers that Google, Yahoo, and Amazon don't.
Paul Graham


A mathematician is a device for turning coffee into theorems.
Paul Erdos (who obviously never met a sysadmin)



- Messaggio originale -
 Da: Grant Ingersoll gsing...@apache.org
 A: solr-user@lucene.apache.org
 Inviato: Mer 24 febbraio 2010, 18:54:32
 Oggetto: Re: If you could have one feature in Solr...
 
 
 On Feb 24, 2010, at 11:08 AM, Stefano Cherchi wrote:
 
  Decent documentation. 
 
 What parts do you feel are lacking?  Or is it just across the board?  Wikis 
 are 
 both good and bad for documentation, IMO.
 
 -Grant






Re: If you could have one feature in Solr...

2010-02-25 Thread Robert Muir
Gora, have you tried the Hindi Analyzer in lucene? if you add it to lucene,
the results exceed at least everything from FIRE 2008.

So I don't really understand where you are getting this information!



 Actually, the state of the art for NLP in Indian languages is
 quite poor, at least in the open-source world.





-- 
Robert Muir
rcm...@gmail.com


Re: If you could have one feature in Solr...

2010-02-25 Thread Gora Mohanty
On Thu, 25 Feb 2010 07:37:33 -0500
Robert Muir rcm...@gmail.com wrote:

 Gora, have you tried the Hindi Analyzer in lucene? if you add it
 to lucene, the results exceed at least everything from FIRE 2008.
[...]

Oh! No, sorry, I haven't. So far, I have only looked at search
through Solr, and I guess I definitely need to look at something
like this if it is available in Lucene.

 So I don't really understand where you are getting this
 information!

Ah, could be my ignorance in this case.

Regards,
Gora


Re: If you could have one feature in Solr...

2010-02-25 Thread Gora Mohanty
On Thu, 25 Feb 2010 07:54:06 -0500
Robert Muir rcm...@gmail.com wrote:

 Gora, I wonder perhaps if there is a documentation issue.
 
 e.g. Thai, Arabic, Chinese were mentioned here previously, these
 are all supported, too.
 
 Let me know if you have any ideas!

Sorry, are you saying that these are available in Solr 1.4?
If so, I have definitely fallen down on my reading.

If they are available in Lucene, but not in Solr, that is still
helpful, but it will take me a little while before I can devote
enough time to set up access to Lucene directly.

In any case, getting Indian languages (and, also down the road
other languages) working is an area that I am definitely interested
in.

Regards,
Gora


Re: If you could have one feature in Solr...

2010-02-25 Thread Ron Mayer
Erik Hatcher wrote:
 Ron - I think SOLR-792 meets the need you describe.  What do you think? 
 It's tree faceting, allowing you to facet down 2 levels deep
 arbitrarily on any two fields.  Ideally we'd enhance it to be of
 arbitrary depth too.

Nice! It certainly handles my main use case.

There are still a couple cases that would benefit from a more
flexible function returning data along with the facets.

In this app, each document represents a crime report describing,
for example, an auto theft.   Those documents have fields such
as the make  model of a car stolen.

In some cases the users would like to see numbers showing
the number of incidents involving cars of those types (which
I think is what Solr returns easily).Sometimes instead of the
number of documents, they'd rather see the number of cars
involved - for example, if a single theft from a dealership involved
multiple cars.   And other times, they'd rather see the value
of the cars returned.

In SQL I can do a select sum(value) from incidents join vehicles...,
and haven't (yet) found similar for facets in solr.

Then again, maybe I should be using the database for that part



 On Feb 24, 2010, at 6:40 PM, Ron Mayer wrote:
 
 Make
  FORD  GM  Honda  TOYOTA  []
 MONDAY 17   23   4   2
 TUESDAY11   9174 5
 WEDNESDAY   3   69   1


Re: If you could have one feature in Solr...

2010-02-25 Thread Gora Mohanty
On Thu, 25 Feb 2010 13:06:03 -0500
Robert Muir rcm...@gmail.com wrote:

 Yeah, Thai and Arabic have the stuff in Solr 1.4
 For Chinese, if you want to do CJK bigram indexing, this is there
 too. If you want to do word-based smart indexing, you need to
 add an additional jar file to your classpath.

OK, but unfortunately, I have little knowledge of these languages,
so that I would not be able to evaluate to what extent that they
are working.

 we can add a wiki page with examples of how to use these maybe to
 make it easier?
 
 we could also add notes to new ones in lucene (hindi, czech,
 bulgarian, etc), as it might be easier to copy some code around
 and get them working with solr 1.4 than to write your own!

That sounds great. I am all in favour of not trying to reinvent
the wheel, and probably badly at that.

 separately, would you be interesting in helping with Bengali and
 Marathi?

The Indian languages that I am personally conversant with are
Hindi, and my native tongue, Oriya. Bengali is quite close to
Oriya linguistically, though with a different script. Marathi
shares a script with Hindi, but words in the language are
quite different. I can try to enlist other open-source folk
in India: We have been part of a moderately successful localisation
effort in India (http://indlinux.org).

So, yes, I would be interested, but probably I have a fair amount
of learning to do about what is needed in the context of a search
engine.

Regards,
Gora


Re: If you could have one feature in Solr...

2010-02-25 Thread Shawn Heisey
I would like to be able to do a delta import on arbitrary data, not a 
last modified date.  Specifically, our database has an auto_increment 
field called DID, or document identifier.  For changes to existing data. 
this field is updated anytime a row is changed in any way, effectively 
turning it into a new document.  On the indexing side, we delete the old 
document and insert the new one.


We are currently using a pricy commercial indexing product (which we 
know is based on Lucene) and are in the process of developing a 
replacement with distributed SOLR.  The dividing line between indexed 
and new data is the highest DID in the existing data set, which we track 
and only update when new data is successfully indexed.


If there's a better way to do this already (multiple cores and index 
merging?), I'm all ears.  We are not very far along, so we have a couple 
of weeks left to define our approach.


Thanks,
Shawn

On 2/24/2010 6:42 AM, Grant Ingersoll wrote:

What would it be?
   




Re: If you could have one feature in Solr...

2010-02-25 Thread Robert Muir
Yeah, Thai and Arabic have the stuff in Solr 1.4
For Chinese, if you want to do CJK bigram indexing, this is there too.
If you want to do word-based smart indexing, you need to add an additional
jar file to your classpath.

we can add a wiki page with examples of how to use these maybe to make it
easier?

we could also add notes to new ones in lucene (hindi, czech, bulgarian,
etc), as it might be easier to copy some code around and get them working
with solr 1.4 than to write your own!

separately, would you be interesting in helping with Bengali and Marathi?

On Thu, Feb 25, 2010 at 10:48 AM, Gora Mohanty g...@srijan.in wrote:

 On Thu, 25 Feb 2010 07:54:06 -0500
 Robert Muir rcm...@gmail.com wrote:

  Gora, I wonder perhaps if there is a documentation issue.
 
  e.g. Thai, Arabic, Chinese were mentioned here previously, these
  are all supported, too.
 
  Let me know if you have any ideas!

 Sorry, are you saying that these are available in Solr 1.4?
 If so, I have definitely fallen down on my reading.

 If they are available in Lucene, but not in Solr, that is still
 helpful, but it will take me a little while before I can devote
 enough time to set up access to Lucene directly.

 In any case, getting Indian languages (and, also down the road
 other languages) working is an area that I am definitely interested
 in.

 Regards,
 Gora




-- 
Robert Muir
rcm...@gmail.com


Re: If you could have one feature in Solr...

2010-02-25 Thread Erik Hatcher
Ron - I think SOLR-792 meets the need you describe.  What do you  
think?  It's tree faceting, allowing you to facet down 2 levels deep  
arbitrarily on any two fields.  Ideally we'd enhance it to be of  
arbitrary depth too.


Erik


On Feb 24, 2010, at 6:40 PM, Ron Mayer wrote:

Another use for this is that I'd like to make a quicker
way of drilling down on my documents than going one facet
at a time by showing the user a 2-dimensional table that
combines 2 facets.   For example, showing a table like this
on the page:

Make
 FORD  GM  Honda  TOYOTA  []
MONDAY 17   23   4   2
TUESDAY11   9174 5
WEDNESDAY   3   69   1
...

and when the user clicks the 174 it automatically adds
the Vehicle Make = Toyota and Day of week = Tuesday
facets to the query.





(I'm a solr newbie, so my apologies if this already exists, or
if it's just a bad idea, or if I should just be using another
tool for that (possibly in conjunction with solr), but)






Re: If you could have one feature in Solr...

2010-02-25 Thread Smiley, David W.
1. Spatial search
2. Ease of managing a sharded index, multi-server Solr instance.

I am aware these are in-progress, slated for Solr 1.5.

I may find myself getting involved on these shortly because I'm working on a 
very large scale search project requiring both.

~ David

On Feb 24, 2010, at 8:42 AM, Grant Ingersoll wrote:

 What would it be?



Re: If you could have one feature in Solr...

2010-02-25 Thread Lance Norskog
Error messages that make sense. I have to read the source far too
often when a simple change to errror-handling would make some feature
easy to use. If I want to read Java I'll use Lucene!

Passive-aggressive error handling is a related problem: when I do
something nonsensical I too often get 0 results found instead of
what does that mean?.

On Thu, Feb 25, 2010 at 12:52 PM, Smiley, David W. dsmi...@mitre.org wrote:
 1. Spatial search
 2. Ease of managing a sharded index, multi-server Solr instance.

 I am aware these are in-progress, slated for Solr 1.5.

 I may find myself getting involved on these shortly because I'm working on a 
 very large scale search project requiring both.

 ~ David

 On Feb 24, 2010, at 8:42 AM, Grant Ingersoll wrote:

 What would it be?





-- 
Lance Norskog
goks...@gmail.com


If you could have one feature in Solr...

2010-02-24 Thread Grant Ingersoll
What would it be?


Re: If you could have one feature in Solr...

2010-02-24 Thread Patrick Sauts
Synchronisation between the slaves to switch the new index at the same 
time after replication.



Grant Ingersoll a écrit :

What would it be?

  




Re: If you could have one feature in Solr...

2010-02-24 Thread Stephen Duncan Jr
On Wed, Feb 24, 2010 at 8:42 AM, Grant Ingersoll gsing...@apache.orgwrote:

 What would it be?


Near real-time search  faceting.

-- 
Stephen Duncan Jr
www.stephenduncanjr.com


Re: If you could have one feature in Solr...

2010-02-24 Thread Markus Jelsma
- performing multiple queries at once, perhaps abusing HTTP POST. On some 
application there is a page that executes five different queries. The HTTP 
overhead is not that much of a problem but it would be a nice to have.

- retrieving documents per facet, not unlike the results from the MoreLikeThis 
components, see: http://mail-archives.apache.org/mod_mbox//lucene-solr-
user/200905.mbox/%3c200905151342.56927.jeffrey.gel...@buyways.nl%3e for a use-
case we once had before, suggested by a colleage.

- stemmers for many more different languages

And of course (hopefully to be including in Solr 1.5:
- field collapsing
- Solr Spatial



On Wednesday 24 February 2010 14:42:18 Grant Ingersoll wrote:
 What would it be?
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: If you could have one feature in Solr...

2010-02-24 Thread Robert Muir
On Wed, Feb 24, 2010 at 9:22 AM, Markus Jelsma mar...@buyways.nl wrote:


 - stemmers for many more different languages


I don't want to hijack this thread, but i would like to know which languages
you are interested in!

-- 
Robert Muir
rcm...@gmail.com


Re: If you could have one feature in Solr...

2010-02-24 Thread Jan Høydahl / Cominvent
A mature document processing pipeline, perhaps integration of 
www.openpipeline.org which is Apache2.0 licensed


Re: If you could have one feature in Solr...

2010-02-24 Thread Markus Jelsma
Well, i don't have a specific request in mind. However, i can image a growing 
internet market for thai, chinese and arabic speaking people and the native 
languages on the african continent. Providing them with stemmers to handle 
plurals etc. will allow for a better search experience.

Also, other components might need overhaul, see SOLR-1078 for an example of a 
language specific issue with a filter.

Perhaps i should rephrase
stemmers for many more different languages

to

support (ie. stemmers, tokenizers etc) for many more different languages.



On Wednesday 24 February 2010 15:25:46 Robert Muir wrote:
 On Wed, Feb 24, 2010 at 9:22 AM, Markus Jelsma mar...@buyways.nl wrote:
 
 - stemmers for many more different languages
 
 
 
 I don't want to hijack this thread, but i would like to know which
  languages you are interested in!
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: If you could have one feature in Solr...

2010-02-24 Thread Paul
Limit the number of results when the results are sorted.

In other words, if the results are sorted by name and there are 10,000
results, then there will be items of low relevancy mixed in with the
results and it is hard for the user to find the relevant ones. If I
could say, give me no more than 200 results, sorted by name, then I
want the most relevant 200 results. (It would be ok to be
approximately 200. If there are documents that are the same relevancy,
then a few more than 200 would be acceptable.)

On Wed, Feb 24, 2010 at 8:42 AM, Grant Ingersoll gsing...@apache.org wrote:
 What would it be?



Re: If you could have one feature in Solr...

2010-02-24 Thread Markus Jelsma
One additional feature within MoreLikeThis might be.. MoreLikeTHESE. This 
would not be the same as querying multiple documents and fetching MoreLikeThis 
documents for each individual result.

This would then actually only return MoreLikeThis documents based on multiple 
documents.

Another collegue could then easier build a system that allows the end-user to 
select multiple documents and get suggestions on the combined input. Of 
course, we could use the existing component's results and cross-reference but 
that's not as simple as getting it from Solr in one go.


On Wednesday 24 February 2010 14:42:18 Grant Ingersoll wrote:
 What would it be?
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: If you could have one feature in Solr...

2010-02-24 Thread Stefano Cherchi
Decent documentation. 

S

 -- 
Anyone proposing to run Windows on servers should be prepared to explain 
what they know about servers that Google, Yahoo, and Amazon don't.
Paul Graham


A mathematician is a device for turning coffee into theorems.
Paul Erdos (who obviously never met a sysadmin)



- Messaggio originale -
 Da: Grant Ingersoll gsing...@apache.org
 A: solr-user@lucene.apache.org
 Inviato: Mer 24 febbraio 2010, 14:42:18
 Oggetto: If you could have one feature in Solr...
 
 What would it be?







Re: If you could have one feature in Solr...

2010-02-24 Thread Grant Ingersoll

On Feb 24, 2010, at 11:08 AM, Stefano Cherchi wrote:

 Decent documentation. 

What parts do you feel are lacking?  Or is it just across the board?  Wikis are 
both good and bad for documentation, IMO.

-Grant

Re: If you could have one feature in Solr...

2010-02-24 Thread straup
I actually found the documentation pretty great especially since (my 
experience, anyway) most Java projects seem to default to generic 
JavaDoc derived documentation (and that makes me cry).


That said, more cookbook-style recipes or stories would be helpful for 
some of the more esoteric parts of Solr.


Also: real-time indexing and geo.

Cheers,

On 2/24/10 9:54 AM, Grant Ingersoll wrote:


On Feb 24, 2010, at 11:08 AM, Stefano Cherchi wrote:


Decent documentation.


What parts do you feel are lacking?  Or is it just across the board?  Wikis are 
both good and bad for documentation, IMO.

-Grant




Re: If you could have one feature in Solr...

2010-02-24 Thread Israel Ekpo
Grant,

One feature that I would like to see is the ability to do a Bitwise search

I have had to work around this with a Query Parser plugin that uses a
org.apache.lucene.search.Filter

I think having this feature would be very nice and I prefer it to searching
with multiple OR type queries especially when the bit are known ahead of
time.

I can submit the code as a patch once I get the approval to do so.

On Wed, Feb 24, 2010 at 2:20 PM, straup str...@gmail.com wrote:

 I actually found the documentation pretty great especially since (my
 experience, anyway) most Java projects seem to default to generic JavaDoc
 derived documentation (and that makes me cry).

 That said, more cookbook-style recipes or stories would be helpful for
 some of the more esoteric parts of Solr.

 Also: real-time indexing and geo.

 Cheers,


 On 2/24/10 9:54 AM, Grant Ingersoll wrote:


 On Feb 24, 2010, at 11:08 AM, Stefano Cherchi wrote:

  Decent documentation.


 What parts do you feel are lacking?  Or is it just across the board?
  Wikis are both good and bad for documentation, IMO.

 -Grant





-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: If you could have one feature in Solr...

2010-02-24 Thread fergus mcmenemie
Chipping in

The wiki based nature of solr's documentation is rather different
compared to most payware and some open source products. However once
you get used to its style I found it quite adequate.

I also dawned on me that portions of Solr are advancing very quickly and
that the wiki style of documentation is the simplest method of allowing
the documentation to keep pace with the code. In my experience payware
products often have proper documentation that trails the state of the
code by years in some cases. New functionality is only described in the
release notes. I also ran into cases where vendors refused to fix simple
features as it would require changing the documentation. I quite like the
wiki!

However some of the pages are a mix of cookbook, examples and documentation
and are too big.


, On 24/02/2010 19:20, straup wrote:
 I actually found the documentation pretty great especially since (my 
 experience, anyway) most Java projects seem to default to generic 
 JavaDoc derived documentation (and that makes me cry).
 
 That said, more cookbook-style recipes or stories would be helpful for 
 some of the more esoteric parts of Solr.
 
 Also: real-time indexing and geo.
 
 Cheers,
 
 On 2/24/10 9:54 AM, Grant Ingersoll wrote:

 On Feb 24, 2010, at 11:08 AM, Stefano Cherchi wrote:

 Decent documentation.

 What parts do you feel are lacking?  Or is it just across the board?  Wikis 
 are both good and bad for documentation, IMO.

 -Grant
 
  

-- 

==
Fergus McMenemieEmail:fer...@twig.me.uk
Techmore Limited,   Phone:(UK) 07721 376021
Old Stables, Far End,   Home: (UK) 01522 810839
Boothby Graffoe, Lincoln,
LN5 0LG, England

Unix/Mac/Intranets/WWW  Analyst Programmer
==


Re: If you could have one feature in Solr...

2010-02-24 Thread Ron Mayer
Grant Ingersoll wrote:
 What would it be?

* Run a MapReduce-likejob on all docs matching the results of a search?

I'm currently working on an app where I hope to be able to do
a query (hopefully using solr) and generate a map where every state
(or county or zip-code or school district or police beat) is colored
based on some attribute derived from some fields in the documents.

Interestingly it seems pleasantly easy to if I'm just basing it
on the count of documents - since I can set up states, etc as facets.
But it'd be neat if instead of just getting the count from the facets,
if I could run more arbitrary math on the documents without having
to suck them into the application.


Another use for this is that I'd like to make a quicker
way of drilling down on my documents than going one facet
at a time by showing the user a 2-dimensional table that
combines 2 facets.   For example, showing a table like this
on the page:

 Make
  FORD  GM  Honda  TOYOTA  []
MONDAY 17   23   4   2
TUESDAY11   9174 5
WEDNESDAY   3   69   1
...

and when the user clicks the 174 it automatically adds
the Vehicle Make = Toyota and Day of week = Tuesday
facets to the query.





(I'm a solr newbie, so my apologies if this already exists, or
if it's just a bad idea, or if I should just be using another
tool for that (possibly in conjunction with solr), but)




Re: If you could have one feature in Solr...

2010-02-24 Thread Andy
The Solr documentation feels more like a reference guide detailing all the 
API's. It's great for more advanced users, but as a beginner I often feel lost 
reading the doc.

It would be really helpful to have a more step-by-step, tutorial approach in 
the doc showing how to do things with tips  tricks. For me the gold standard 
of documentation is Django, the doc there is ridiculously good.

--- On Wed, 2/24/10, Grant Ingersoll gsing...@apache.org wrote:

From: Grant Ingersoll gsing...@apache.org
Subject: Re: If you could have one feature in Solr...
To: solr-user@lucene.apache.org
Date: Wednesday, February 24, 2010, 12:54 PM


On Feb 24, 2010, at 11:08 AM, Stefano Cherchi wrote:

 Decent documentation. 

What parts do you feel are lacking?  Or is it just across the board?  Wikis are 
both good and bad for documentation, IMO.

-Grant


  

Re: If you could have one feature in Solr...

2010-02-24 Thread Andy
1) Built-in hierarchical faceting
Right now there're 2 patches, SOLR-64 and SOLR-792. SOLR-64 seems to be slated 
for 1.5 release but according to the wiki seems to have poor performance. 
SOLR-792 has better performance according to the wiki but it's unclear if it'll 
ever be part of the Solr distribution. Not sure what are the pros  cons of the 
2 patches. Hierarchical faceting is very common and it'd be nice to have this 
as a standard feature

2) Near real-time update  search

3) Partial update - something like LUCENE-1879

4) Automatic language detection  stemmer selection
I have document collections that are a mix of different languages. It'd be 
great to be able to automatically detect what language a specific document is 
in and select an appropriate stemmer for that language

5) Cloudy Solr - automatic sharding, replication, fail-over. Sorta like a 
Cassandra version of Solr...



--- On Wed, 2/24/10, Grant Ingersoll gsing...@apache.org wrote:

From: Grant Ingersoll gsing...@apache.org
Subject: If you could have one feature in Solr...
To: solr-user@lucene.apache.org
Date: Wednesday, February 24, 2010, 8:42 AM

What would it be?



  

Re: If you could have one feature in Solr...

2010-02-24 Thread Gora Mohanty
On Wed, 24 Feb 2010 15:49:15 +0100
Markus Jelsma mar...@buyways.nl wrote:

 Well, i don't have a specific request in mind. However, i can
 image a growing internet market for thai, chinese and arabic
 speaking people and the native languages on the african
 continent. Providing them with stemmers to handle plurals etc.
 will allow for a better search experience.

The same is true for Indian languages.

Actually, the state of the art for NLP in Indian languages is
quite poor, at least in the open-source world. So, even before 
stemmers, one should address things like phonetic analysers,
spellcheckers, stop words, etc. This latter part is possible now,
and we have an alpha-level implementation for Hindi that we would
be glad to contribute once it is stable.

Which reminds me: One thing that I would like to see immediately
in Solr is to have the Metaphone/DoubleMetaphone phonetic analysers
use a configuration file for phonetic rules, rather than having
these hard coded. The aspell library does this, for example. Please
see http://aspell.net/man-html/Phonetic-Code.html#Phonetic-Code
for an explanation of its rules.

Regards,
Gora


Re: If you could have one feature in Solr...

2010-02-24 Thread Matthew Rushton
Real time search would be awesome.
-Matt