Re: solr

2010-08-21 Thread Sumit Arora
Please follow guidelines from :

http://lucene.apache.org/solr/tutorial.html

http://lucene.apache.org/solr/tutorial.html/Sumit

On Sat, Aug 21, 2010 at 11:25 AM, ankita shinde
ankita.shind...@gmail.comwrote:

 Hello,
 I am new to solr.
 Can anyone please guide me how to install and use solr?
 Reply.
 -Ankita Shinde



Re: Require some advice

2010-08-21 Thread Tommaso Teofili
Hi Pavan,
you may want to plug UIMA as a particular UpdateRequestProcessor [1] while
indexing data (I am working on such a use case). This way you could extract
entities and add them either as dynamicFields or pre defined (fixed) fields.

2010/8/12 Michael Griffiths mgriffi...@am-ind.com


 While there are some decent open source entity extraction tools, they are
 focused on processing sentences and paragraphs. The structural differences
 in text messages means you'd need to do a fair amount of work to get decent
 entity extraction.

 That said, you may want to look into simple word/phrase matching if your
 domain is sufficiently small. Use RegEx to extract ZIP, use dictionaries to
 extract city/area, skills, and names. Much simpler and cheaper.



in UIMA you have some components that may be useful (DictionaryAnnotator,
ConceptMapper, Tagger, RegExAnnotator [2] ) for the above cases, however, as
Michael underlined, you have to consider the effort needed to understand,
use and eventually customize such components. UIMA is well suited for large
scale collections of data and let you work on a flexible and customizable
analysis pipeline that may change and be enriched in the future, but you
have to evaluate well if you deserve it.


2010/8/12 Nagelberg, Kallin knagelb...@globeandmail.com

 Try this,

 http://viewer.opencalais.com/


the OpenCalais service is wrapped as a UIMA analysis engine and may be
called inside a UIMA pipeline together with other components (see above) or
services (i.e.: the UIMA wrapped Alchemy API service [3] ).
That said, this makes sense only if you are strongly focused on searching
over text and its extracted entities.
My 2 cents,
Tommaso

[1] : http://wiki.apache.org/solr/UpdateRequestProcessor
[2] : http://uima.apache.org/annotators.html
[3] : http://svn.apache.org/viewvc/uima/sandbox/trunk/AlchemyAPIAnnotator/


solr

2010-08-21 Thread ankita shinde
hi all,
is there need to allot a unique id to every file?
do we have to specify the id manually or solr does it?
how to allot an unique id to text file?


Re: solr

2010-08-21 Thread Rafał Kuć
Hello!

 is there need to allot a unique id to every file?

You don`t need one, unique id is not mandatory, but many features wont
work without it.

 do we have to specify the id manually or solr does it?

Solr doesn`t do it automatically, You have to do it.

 how to allot an unique id to text file?

Just generate an id in your application and pass it to ie. xml file.

If you have some questions about uniqe id, this page should be a place
for You http://wiki.apache.org/solr/UniqueKey

-- 
Regards,
 Rafał Kuć



solr

2010-08-21 Thread ankita shinde
hi,
does all the data to be indexed has to be in exampledocs folder?
how to import data from mysql?
I have tried the steps on http://wiki.apache.org/solr/DataImportHandler.
but its giving me error as could not create importer.dataimporter.
What does it mean?
I am completely new to solr.
How to configure solr?


Possible to have more than 1 uniqueKey fields in a document?

2010-08-21 Thread Andy
Is it possible to define more than 1 uniqueKey fields per document in 
schema.xml?


  


Duplicate docs when mergin

2010-08-21 Thread Andrew Clegg


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Duplicate-docs-when-mergin-tp1261979p1261979.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to Debug Sol-Code in Eclipse ?!

2010-08-21 Thread stockii

Hello..

Can anyone give me some tipps to debug the solr-code in Eclipse ? or do i
need apache-Ant to do this ? 

thhx =) 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-Debug-Sol-Code-in-Eclipse-tp1262050p1262050.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr

2010-08-21 Thread Peter Karich
Hi Ankita,

first: thanks for trying apache solr.

 does all the data to be indexed has to be in exampledocs folder?

No. And there are several ways to push data into solr: via indexing,
dataimporthandler, solrj, ...

I know that getting comfortable with a new project is a bit complicated
at first,
but you should try to read some more information available at the wiki
etc. before publishing posts in a rush.
(keep in mind that a mailing list is mostly driven by people in their
freetime)

Here are some links:
http://wiki.apache.org/solr/FAQ
http://wiki.apache.org/solr/SolrResources
http://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-1
http://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2

Another nice documentation is:
http://www.lucidimagination.com/Downloads/LucidWorks-for-Solr/Reference-Guide

Hope you don't misunderstand me wrong! So, ask if you need help, but try
to dig into deeper before!

Regards,
Peter.

 hi,
 does all the data to be indexed has to be in exampledocs folder?
 how to import data from mysql?
 I have tried the steps on http://wiki.apache.org/solr/DataImportHandler.
 but its giving me error as could not create importer.dataimporter.
 What does it mean?
 I am completely new to solr.
 How to configure solr?
   


Re: /update/extract

2010-08-21 Thread Jayendra Patil
The Extract Request Handler invokes the classes from the extraction package.

https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/extraction/src/main/java/org/apache/solr/handler/extraction/ExtractingRequestHandler.java

This is package into the apache-solr-cell jar.

Regards,
Jayendra*

*
On Thu, Aug 19, 2010 at 10:04 AM, satya swaroop sswaro...@gmail.com wrote:

 Hi all,
   when we handle extract request handler what class gets invoked.. I
 need to know the navigation of classes when we send any files to solr.
 can anybody tell me the classes or any sources where i can get the answer..
 or can anyone tell me what classes get invoked when we start the
 solr... I be thankful if anybody can help me with regarding this..

 Regards,
 satya



Re: Duplicate docs when merging indices?

2010-08-21 Thread Gora Mohanty
On Sat, 21 Aug 2010 05:26:59 -0700 (PDT)
Andrew Clegg andrew.cl...@gmail.com wrote:
[...]
 If I merge two indices with CoreAdmin, as detailed here...
 
 http://wiki.apache.org/solr/MergingSolrIndexes
 
 What happens to duplicate documents between the two? i.e. those
 that have the same unique key.
 
 What decides which copy takes precedence? Will documents get
 indexed multiple times, or will the second one just get skipped?
[...]

Have not used CoreAdmin, but with MergeTool, know from personal
experience that there would be duplicates created. I imagine
that the same is the case for CoreAdmin as Solr/Lucene allows
duplicate IDs.

Regards,
Gora


Re: facets - id and display value

2010-08-21 Thread Lance Norskog
Faceting harvests the fields that are already indexed (so you have to
both store and index the fields) and uses Java object refs (pointers),
without copying the facet values. You know how log files have
multi-line exception stacks  the like? The multi-line exception
stacks after the real log line tend to be the same. I grabbed all of
the lines after each log line and made facets out of them. Worked
quite well for counting this exception stack happened 42 times, this
other one 250 times. So huge string fields work as facets.

I don't know if 'facet.prefix' on 50 characters is faster than 'q=' on
200 characters.

Sending a giant query is easy: use a POST instead of a GET.

If searching on giant facet strings really is a problem, add a hash
code to each facet string. Then, add a separate matching field in each
document that only stores that hashcode. Now, instead of searching on
the giant facet, you pull the hashcode out of it and search the
separate field for that.


On Fri, Aug 20, 2010 at 9:56 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 A common way is to make a facet string of categoryId-2_name_imageurl.
 Then in your UI display the categoryId part of the facet.

 I've been thinking about  doing something like this for the same purposes. 
 Will having an extra long facet string like that have any effect on 
 faceting performace?  How about facet sorting with facet.sort=index?  In my 
 case, the first part of the facet string would be a 'sortable' value that 
 sorts how I want, not just an id.

 I use facet.sort=index, but my display labels don't actually sort the way I 
 want, so I'm thinking of making a sort key that does, and storing 
 sortkey_label in the actual facet value.  But I worry this may have an 
 effect on performance if the string gets really long. But I'm thinking/hoping 
 it won't -- at least for faceting the length of string shoudln't matter, I 
 think, but not sure about for sorting.  [Obviously you have to make sure to 
 not accidentally store the same 'id' with differently serialized 'metadata', 
 or you'd wind up with two facet values where you meant to have one].

 Is there any reason I couldn't use some non-printing control char as the 
 seperator, instead of just in that example ascii underscore?

 And then the other thing is, once I have these weird long facet strings with 
 embedded 'metadata', if I actually want to 'fq' on one, I need to pass that 
 whole weird string in the fq, clearly.  How do people generally deal with 
 this, using this technique? Just do it, pass the whole string?  Use some sort 
 of 'prefix' technique (I guess that would be the * wildcard in the fq)?  Use 
 two different solr fields, one for faceting with embedded metadata, and a 
 different one with the same values without embedded metadata for actual 'fq' 
 filtering?

 Thanks for any tips,

 Jonathan




-- 
Lance Norskog
goks...@gmail.com


Re: solr

2010-08-21 Thread Lance Norskog
This will make a unique key for you:

In types
fieldType name=uuid class=solr.UUIDField indexed=true /

In fields
 field name=id type=uuid indexed=true stored=true default=NEW/


2010/8/21 Rafał Kuć ra...@alud.com.pl:
 Hello!

 is there need to allot a unique id to every file?

 You don`t need one, unique id is not mandatory, but many features wont
 work without it.

 do we have to specify the id manually or solr does it?

 Solr doesn`t do it automatically, You have to do it.

 how to allot an unique id to text file?

 Just generate an id in your application and pass it to ie. xml file.

 If you have some questions about uniqe id, this page should be a place
 for You http://wiki.apache.org/solr/UniqueKey

 --
 Regards,
  Rafał Kuć





-- 
Lance Norskog
goks...@gmail.com


Re: Possible to have more than 1 uniqueKey fields in a document?

2010-08-21 Thread Lance Norskog
There can be as many as you want. Buy you can only specify one as the
uniqueKey. That is used for Distributed Search and deduplication.

Indexing might work better if you concatenate the different unique
values into one field.

On Sat, Aug 21, 2010 at 3:27 AM, Andy angelf...@yahoo.com wrote:
 Is it possible to define more than 1 uniqueKey fields per document in 
 schema.xml?







-- 
Lance Norskog
goks...@gmail.com


Re: How to Debug Sol-Code in Eclipse ?!

2010-08-21 Thread Lance Norskog
Running unit tests is easy, once you set the right 'current directory'
so that unit tests can find their resource files. I have found that if
I get a full set of unit tests for something, I don't have to debug it
in the full app.

Running the whole thing as a servlet has the whole servlet engine
setup thing, which I avoid.

Running as EmbeddedSolr might be easy, I haven't tried it.

I usually make a separate empty Java project and import source and
libs as needed. I do a lot of  'search everything for this string' so
having the whole source tree in the project just slows me down. This
does remove the ability to use the svn/git management, but I don't
mind that.

On Sat, Aug 21, 2010 at 5:27 AM, stockii st...@shopgate.com wrote:

 Hello..

 Can anyone give me some tipps to debug the solr-code in Eclipse ? or do i
 need apache-Ant to do this ?

 thhx =)
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-Debug-Sol-Code-in-Eclipse-tp1262050p1262050.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Lance Norskog
goks...@gmail.com


Re: Possible to have more than 1 uniqueKey fields in a document?

2010-08-21 Thread Andy
I'm still a bit confused. Can I define 2 uniqueKey fields in schema.xml?

I want to use 2 outside apps. One define a uniqueKey that is a mix of alphabets 
and numbers. Another app requires a uniqueKey of the type long.

Obviously the 2 requirements aren't compatible. I'm trying to see if it's 
possible to define 2 uniqueKeys so each app could have its own one.

--- On Sat, 8/21/10, Lance Norskog goks...@gmail.com wrote:

 From: Lance Norskog goks...@gmail.com
 Subject: Re: Possible to have more than 1 uniqueKey fields in a document?
 To: solr-user@lucene.apache.org
 Date: Saturday, August 21, 2010, 5:23 PM
 There can be as many as you want. Buy
 you can only specify one as the
 uniqueKey. That is used for Distributed Search and
 deduplication.
 
 Indexing might work better if you concatenate the different
 unique
 values into one field.
 
 On Sat, Aug 21, 2010 at 3:27 AM, Andy angelf...@yahoo.com
 wrote:
  Is it possible to define more than 1 uniqueKey fields
 per document in schema.xml?
 
 
 
 
 
 
 
 -- 
 Lance Norskog
 goks...@gmail.com
 


  


Re: Possible to have more than 1 uniqueKey fields in a document?

2010-08-21 Thread Lance Norskog
You can only have one field marked as the unique key on Solr. That's
it. If you happen to have two unique values per document, that is ok.
Only one of them can be the official unique key.

It's just like a primary key in a database table. You can't have two primaries.

There can only be one. - Highlaner

On Sat, Aug 21, 2010 at 5:00 PM, Andy angelf...@yahoo.com wrote:
 I'm still a bit confused. Can I define 2 uniqueKey fields in schema.xml?

 I want to use 2 outside apps. One define a uniqueKey that is a mix of 
 alphabets and numbers. Another app requires a uniqueKey of the type long.

 Obviously the 2 requirements aren't compatible. I'm trying to see if it's 
 possible to define 2 uniqueKeys so each app could have its own one.

 --- On Sat, 8/21/10, Lance Norskog goks...@gmail.com wrote:

 From: Lance Norskog goks...@gmail.com
 Subject: Re: Possible to have more than 1 uniqueKey fields in a document?
 To: solr-user@lucene.apache.org
 Date: Saturday, August 21, 2010, 5:23 PM
 There can be as many as you want. Buy
 you can only specify one as the
 uniqueKey. That is used for Distributed Search and
 deduplication.

 Indexing might work better if you concatenate the different
 unique
 values into one field.

 On Sat, Aug 21, 2010 at 3:27 AM, Andy angelf...@yahoo.com
 wrote:
  Is it possible to define more than 1 uniqueKey fields
 per document in schema.xml?
 
 
 
 



 --
 Lance Norskog
 goks...@gmail.com








-- 
Lance Norskog
goks...@gmail.com


Autocomplete and Sorting on multiple multi-value/single-value fields

2010-08-21 Thread Neil Lott
Hi,

I'm wondering if anyone has run across this issue before.  I do understand that 
you cannot sort on a multivalued field -- so I'm looking for alternatives
people have used.

Let's say I have nine fields:

field name=title type=text indexed=true stored=true 
required=true/
field name=titleac type=autocomplete indexed=true stored=true 
omitNorms=true omitTermFreqAndPositions=true/
field name=titlesort type=alphaOnlySort indexed=true 
stored=true/

field name=cast type=text indexed=true stored=true 
required=true multiValued=true/
field name=castac type=autocomplete indexed=true stored=true 
omitNorms=true omitTermFreqAndPositions=true multiValued=true/

field name=crew type=text indexed=true stored=true 
required=true multiValued=true/
field name=crewac type=autocomplete indexed=true stored=true 
omitNorms=true omitTermFreqAndPositions=true multiValued=true/

The text field type is standard:

fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
/fieldType

The autocomplete field type is pretty standard as well:

 fieldType name=autocomplete1 class=solr.TextField 
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=1 
maxGramSize=100/
/analyzer
analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory/
/analyzer
/fieldType

The sort I need to be case sensitive including punctuation etc, so that field 
type looks like this:

fieldType name=alphaOnlySort class=solr.TextField 
sortMissingLast=true omitNorms=true
analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.TrimFilterFactory/
/analyzer
/fieldType

So if I do this:


http://localhost:8983/solr/core/select/?q=titleac:drversion=2.2start=0rows=100indent=onfl=titlesort=titlesort
 asc

Everything works and I get a set of autocompleted results starting with dr in 
all forms sorted.  Exactly what I want.

The problem is that I also need to do this:

http://localhost:8983/solr/core/select/?q=(titleac:dr or 
castac:dr)version=2.2start=0rows=100indent=onfl=title,cast

(and the results need to be sorted across both the title field or a match in 
the multivalued cast field)

And I also need to do this:

http://localhost:8983/solr/core/select/?q=(titleac:dr or castac:dr or 
crewac:dr)version=2.2start=0rows=100indent=onfl=title,cast,crew

(and the results need to be sorted across both the title field or a match in 
the multivalued cast field or a match in the multivalued crew field)

As you can see I'm trying to autocomplete across multiple fields some of which 
are multi-valued and then sort those results in solr so solr does all my paging 
work.  

This way I don't have to load the full results sets into my jvm client and then 
manually sort them each time.  

You can also see I'm trying to make it into one query as my assumption is that 
this will take the least amount of time.

Would anyone happen to have suggestions to how I'm approaching this problem?

Thanks,

Neil