date:20101018

How can i get collect stemmed query?

2010-10-18 Thread Jerad


Hi~. I'm beginner who wanna make search system by using solr 1.4.1 and lucene
2.92.

I got a collect lucene query from my custom Analyzer and filter from given
query,
but no result displayed.

Here is my Analyzer source.

--
public class KLTQueryAnalyzer extends Analyzer{
public static final Version LUCENE_VERSION = Version.LUCENE_29;
public static int QUERY_MIN_LEN_WORD_FILTER = 1;
public static int QUERY_MAX_LEN_WORD_FILTER = 40;

public int elapsedTime = 0;

@Override
public TokenStream tokenStream(String paramString, Reader reader) {
StandardTokenizer tokenizer = new StandardTokenizer( 
du.utas.mcrdr.ir.lucene.WebDocIR.LUCENE_VERSION, reader );

TokenStream tokenStream = new LengthFilter( tokenizer,
QUERY_MIN_LEN_WORD_FILTER,
 QUERY_MAX_LEN_WORD_FILTER );
tokenStream = new LowerCaseFilter( tokenStream );


//My custom stemmer method
KLTSingleWordStemmer stemer = new
KLTSingleWordStemmer(QUERY_MIN_LEN_WORD_FILTER, QUERY_MAX_LEN_WORD_FILTER);

//My custom analyzer filter. this filter return sub-merged query.
//ex) INPUT : flyaway
// RETURN VALUE : fly +body:away
tokenStream = new KLTQueryStemFilter( tokenStream, stemer, this );

return tokenStream;
}
}
--


example query)  Input User query : +body:flyaway 
  Expected analyzed query : +body:fly +body:away

  INDEXED DATA : body fly away


I'm expecting 1 docs returned from index, but I have no result returned.

explain my custom flow

1. User input query : +body:flyaway
2. Analyzer return that : fly +body:away
3. Solr attach search field tag at filter returned query : +body as i
defined at schema.xml.(default operator AND)
4. I'm indexed 1 docs that have field name body, has containing this
phrase fly away
5. I expect 1 docs return of result by query +body:fly +body:away but 0
docs returned.

What's the problem?? Anybody help me please~ :


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-can-i-get-collect-stemmed-query-tp1723055p1723055.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How can i get collect stemmed query?

2010-10-18 Thread Ahmet Arslan

Are you using KLTQueryAnalyzer outside of the Solr? (pre-process)
Or you defined a fieldType in schema.xml that uses KLTQueryAnalyzer?

Can you append debugQuery=on to your search url and paste output?

--- On Mon, 10/18/10, Jerad ag...@naver.com wrote:

 From: Jerad ag...@naver.com
 Subject: How can i get collect stemmed query?
 To: solr-user@lucene.apache.org
 Date: Monday, October 18, 2010, 9:15 AM

 Hi~. I'm beginner who wanna make search system by using
 solr 1.4.1 and lucene
 2.92.

 I got a collect lucene query from my custom Analyzer and
 filter from given
 query,
 but no result displayed.

 Here is my Analyzer source.

 --
 public class KLTQueryAnalyzer extends Analyzer{
     public static final Version LUCENE_VERSION =
 Version.LUCENE_29;
     public static int QUERY_MIN_LEN_WORD_FILTER =
 1;
     public static int QUERY_MAX_LEN_WORD_FILTER =
 40;

     public int elapsedTime = 0;

     @Override
     public TokenStream tokenStream(String
 paramString, Reader reader) {
         StandardTokenizer tokenizer =
 new StandardTokenizer( 

 du.utas.mcrdr.ir.lucene.WebDocIR.LUCENE_VERSION, reader );

         TokenStream tokenStream = new
 LengthFilter( tokenizer,
 QUERY_MIN_LEN_WORD_FILTER,

    QUERY_MAX_LEN_WORD_FILTER );
         tokenStream = new
 LowerCaseFilter( tokenStream );

         //My custom stemmer method
         KLTSingleWordStemmer stemer =
 new
 KLTSingleWordStemmer(QUERY_MIN_LEN_WORD_FILTER,
 QUERY_MAX_LEN_WORD_FILTER);

         //My custom analyzer filter.
 this filter return sub-merged query.
         //ex) INPUT : flyaway
         // 
    RETURN VALUE : fly +body:away
         tokenStream = new
 KLTQueryStemFilter( tokenStream, stemer, this );

         return tokenStream;
     }
 }
 --

 example query)  Input User query : +body:flyaway 

       Expected analyzed query : +body:fly
 +body:away

               INDEXED
 DATA : body fly away

 I'm expecting 1 docs returned from index, but I have no
 result returned.

 explain my custom flow

 1. User input query : +body:flyaway
 2. Analyzer return that : fly +body:away
 3. Solr attach search field tag at filter returned query :
 +body as i
 defined at schema.xml.(default operator AND)
 4. I'm indexed 1 docs that have field name body, has
 containing this
 phrase fly away
 5. I expect 1 docs return of result by query +body:fly
 +body:away but 0
 docs returned.

 What's the problem?? Anybody help me please~ :

 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-can-i-get-collect-stemmed-query-tp1723055p1723055.html
 Sent from the Solr - User mailing list archive at
 Nabble.com.

Re: how can i use solrj binary format for indexing?

2010-10-18 Thread Peter Karich

Hi,

you can try to parse the xml via Java yourself and then push the
SolrInputDocuments it via SolrJ to solr.
setting format to binaray + using the streaming update processor should
improve performance,
but I am not sure... and performant (+less mem!) reading xml in Java is
another topic ... ;-)

Regards,
Peter.

 Hi all
 I have a huge amount of xml files for indexing.
 I want to index using solrj binary format to get performance gain.
 Because I heard that using xml files to index is quite slow.
 But I don't know how to use index through solrj binary format and can't find
 examples.
 Please give some help.
 Thanks,
   


-- 
http://jetwick.com twitter search prototype

Re: query between two date

2010-10-18 Thread Savvas-Andreas Moysidis

You'll have to supply your dates in a format Solr expects (e.g.
2010-10-19T08:29:43Z
and not 2010-10-19). If you don't need millisecond granularity you can use
the DateMath syntax to specify that.

Please, also check http://wiki.apache.org/solr/SolrQuerySyntax.

On 17 October 2010 10:54, nedaha neda...@gmail.com wrote:


 Hi there,

 At first i have to explain the situation.
 I have 2 fields indexed named tdm_avail1 and tdm_avail2 that are arrays of
 some different dates.

 This is a sample doc


 arr name=tdm_avail1
 date2010-10-21T08:29:43Z/date
 date2010-10-22T08:29:43Z/date
 date2010-10-25T08:29:43Z/date
 date2010-10-26T08:29:43Z/date
 date2010-10-27T08:29:43Z/date
 /arr

 arr name=tdm_avail2
 date2010-10-19T08:29:43Z/date
 date2010-10-20T08:29:43Z/date
 date2010-10-21T08:29:43Z/date
 date2010-10-22T08:29:43Z/date
 /arr

 And in my search form i have 2 field named check-in date and check-out
 date.
 I want solr to compare the range that user enter in the search form with
 the
 values of tdm_avail1 and tdm_avail2 and return doc if all dates between
 check-in and check-out dates matches with tdm_avail1 or tdm_avail2 values.

 for example if user enter:
 check-in date: 2010-10-19
 check-out date: 2010-10-21
 that is match with tdm_avail2 then doc must be returned.

 but if user enter:
 check-in date: 2010-10-25
 check-out date: 2010-10-29
 doc could not be returned.

 so i want the query that gives me the mentioned result. could you help me
 please?

 thanks in advance

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/query-between-two-date-tp1718566p1718566.html
 Sent from the Solr - User mailing list archive at Nabble.com.

AW: How do you programatically create new cores?

2010-10-18 Thread Bastian

A http-get call is simply made by entering the url into your browser, like
shown in the example in the wiki:

http://localhost:8983/solr/admin/cores?action=CREATEname=coreXinstanceDir=
path_to_instance_directoryconfig=config_file_name.xmlschema=schem_file_nam
e.xmldataDir=data 

-Ursprüngliche Nachricht-
Von: Tharindu Mathew [mailto:mcclou...@gmail.com] 
Gesendet: Sonntag, 17. Oktober 2010 18:07
An: solr-user@lucene.apache.org
Cc: solr-user@lucene.apache.org
Betreff: Re: How do you programatically create new cores?

Hi Marc, 

Thanks for the reply. 

So as I understand I need to make a http get call with an action parameter
set to create to dynamically create a core? I do not see an API to do this
anywhere. 

On Oct 17, 2010, at 3:54 PM, Marc Sturlese marc.sturl...@gmail.com wrote:

 
 You have to create the core's folder with it's conf inside the Solr home.
 Once done you can call the create action of the admin handler:
 http://wiki.apache.org/solr/CoreAdmin#CREATE
 If you need to dinamically create, start and stop lots of cores 
 there's this patch, but don't know about it's current state:
 http://wiki.apache.org/solr/LotsOfCores
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-do-you-programatically-create-n
 ew-cores-tp1706487p1718648.html Sent from the Solr - User mailing list 
 archive at Nabble.com.

Re: query between two date

2010-10-18 Thread nedaha


Thanks for your reply.
I know about the solr date format!! Check-in and Check-out dates are
user-friendly format that we use in our search form for system's users. and
i change the format via code and then send them to solr.
I want to know how can i make a query to compare a range between check-in
and check-out date with some separate different days that i have in solr
index.
for example: 
check-in date is: 2010-10-19T00:00:00Z
and
check-out date is: 2010-10-21T00:00:00Z

when i want to build a query from my application i have a range date but in
solr index i have separate dates.
So how can i compare them to get the appropriate result? 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/query-between-two-date-tp1718566p1723752.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How can i get collect stemmed query?

2010-10-18 Thread Jerad


Oops, I'm Sorry! I found some mistakes on previous posted source.( Main class
name has been wrong :)

This is the collect analyzer source.
---
public class MyCustomQueryAnalyzer extends Analyzer{ 
public static final Version LUCENE_VERSION = Version.LUCENE_29; 
public static int QUERY_MIN_LEN_WORD_FILTER = 1; 
public static int QUERY_MAX_LEN_WORD_FILTER = 40; 

public int elapsedTime = 0; 

@Override 
public TokenStream tokenStream(String paramString, Reader reader) { 
StandardTokenizer tokenizer = new StandardTokenizer( 
du.utas.mcrdr.ir.lucene.WebDocIR.LUCENE_VERSION, reader ); 

TokenStream tokenStream = new LengthFilter( tokenizer,
QUERY_MIN_LEN_WORD_FILTER, 
 QUERY_MAX_LEN_WORD_FILTER ); 
tokenStream = new LowerCaseFilter( tokenStream ); 


//My custom stemmer method 
MyCustomSingleWordStemmer stemer = new
MyCustomSingleWordStemmer(QUERY_MIN_LEN_WORD_FILTER,
QUERY_MAX_LEN_WORD_FILTER); 

//My custom analyzer filter. this filter return sub-merged query. 
//ex) INPUT : flyaway 
// RETURN VALUE : fly +body:away 
tokenStream = new KLTQueryStemFilter( tokenStream, stemer, this ); 

return tokenStream; 
} 
} 

---

[Additional info]

1. MyCustomQueryAnalyzer made outside of Solr.
I made this analyzer outside of the solr package and make it to ~.jar
and located at 

   
~/Solr/example/work/Jetty_0_0_0_0_8982_solr.war__solr__-2c5peu/webapp/WEB-INF/lib
 

2. I edited field type and field name in scheme.xml which to be searched.

field name=body type=textTp indexed=true stored=true
omitNorms=true/

fieldType name=textTp class=solr.TextField
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
class=com.testsolr.ir.customAnalyzer.MyCustomQueryAnalyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType

This is my custom scheme.xml and custom search field type.

3. I've got this xml result when I append debugQuery=on to my search url.


  ?xml version=1.0 encoding=UTF-8 ? 
- response
- lst name=responseHeader
  int name=status0/int 
  int name=QTime0/int 
- lst name=params
  str name=debugQueryon/str 
  str name=indenton/str 
  str name=start0/str 
  str name=q+body:flyaway/str 
  str name=version2.2/str 
  str name=rows10/str 
  /lst
  /lst
  result name=response numFound=0 start=0 / 
- lst name=debug
  str name=rawquerystring+body:flyaway/str 
  str name=querystring+body:flyaway/str 
  str name=parsedquery+body:fly +body:away/str 
  str name=parsedquery_toString+body:fly +body:away/str 
  lst name=explain / 
  str name=QParserLuceneQParser/str 
- lst name=timing
  double name=time0.0/double 
- lst name=prepare
  double name=time0.0/double 
- lst name=org.apache.solr.handler.component.QueryComponent
  double name=time0.0/double 
  /lst
- lst name=org.apache.solr.handler.component.FacetComponent
  double name=time0.0/double 
  /lst
- lst name=org.apache.solr.handler.component.MoreLikeThisComponent
  double name=time0.0/double 
  /lst
- lst name=org.apache.solr.handler.component.HighlightComponent
  double name=time0.0/double 
  /lst
- lst name=org.apache.solr.handler.component.StatsComponent
  double name=time0.0/double 
  /lst
- lst name=org.apache.solr.handler.component.DebugComponent
  double name=time0.0/double 
  /lst
  /lst
- lst name=process
  double name=time0.0/double 
- lst name=org.apache.solr.handler.component.QueryComponent
  double name=time0.0/double 
  /lst
- lst name=org.apache.solr.handler.component.FacetComponent
  double name=time0.0/double 
  /lst
- lst name=org.apache.solr.handler.component.MoreLikeThisComponent
  double name=time0.0/double 
  /lst
- lst name=org.apache.solr.handler.component.HighlightComponent
  double name=time0.0/double 
  /lst
- lst name=org.apache.solr.handler.component.StatsComponent
  double name=time0.0/double 
  /lst
- lst name=org.apache.solr.handler.component.DebugComponent
  double name=time0.0/double 
  /lst
  /lst
  /lst
  /lst
  /response


I really appreciate your advice~ :)

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-can-i-get-collect-search-result-from-custom-filtered-query-tp1723055p1723815.html
Sent from the Solr - User mailing list archive at Nabble.com.

Boosting documents based on the vote count

2010-10-18 Thread Alexandru Badiu

Hello all,

I have a field in my schema which holds the number of votes a document
has. How can I boost documents based on that number?

Something like the one which has the maximum number has a boost of 10,
the one with the smallest number has 0.5 and in between the values get
calculated automatically.

Thanks,
Alexandru Badiu

Re: query between two date

2010-10-18 Thread Savvas-Andreas Moysidis

ok, maybe don't get this right..

are you trying to match something like check-in date  2010-10-19T00:00:00Z
AND check-out date  2010-10-21T00:00:00Z  *or* check-in date =
2010-10-19T00:00:00Z
AND check-out date = 2010-10-21T00:00:00Z?

On 18 October 2010 10:05, nedaha neda...@gmail.com wrote:


 Thanks for your reply.
 I know about the solr date format!! Check-in and Check-out dates are
 user-friendly format that we use in our search form for system's users. and
 i change the format via code and then send them to solr.
 I want to know how can i make a query to compare a range between check-in
 and check-out date with some separate different days that i have in solr
 index.
 for example:
 check-in date is: 2010-10-19T00:00:00Z
 and
 check-out date is: 2010-10-21T00:00:00Z

 when i want to build a query from my application i have a range date but in
 solr index i have separate dates.
 So how can i compare them to get the appropriate result?
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/query-between-two-date-tp1718566p1723752.html
 Sent from the Solr - User mailing list archive at Nabble.com.

solr requirements

2010-10-18 Thread satya swaroop

Hi All,
I am planning to have a separate server for solr and regarding
hardware requirements i have a doubt about what configuration to be needed.
I know it will be hard to tell but i just need a minimum requirement for the
particular situation as follows::


1) There are 1000 regular users using solr and Every day each user indexes
10 files of 1KB each and totally it leads to a size of 10MB for a day and it
goes on...???

2)How much of RAM is used by solr in genral???

Thanks,
satya

Re: query between two date

2010-10-18 Thread nedaha


The exact query that i want is:

check-in date = 2010-10-19T00:00:00Z 
AND check-out date = 2010-10-21T00:00:00Z

but because of the structure that i have to index i don't have specific
start date and end date in my solr index to compare with check-in and
check-out date range. I have some dates that available to reserve! 

Could you please help me? :)


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/query-between-two-date-tp1718566p1724062.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: API for using Multi cores with SolrJ

2010-10-18 Thread Peter Karich

I asked this myself ... here could be some pointers:

http://lucene.472066.n3.nabble.com/SolrJ-and-Multi-Core-Set-up-td1411235.html
http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-in-Single-Core-td475238.html

 Hi everyone,

 I'm trying to write some code for creating and using multi cores.

 Is there a method available for this purpose or do I have to do a HTTP
 to a URL such as
 http://localhost:8983/solr/admin/cores?action=STATUScore=core0

 Is there an API available for this purpose. For example, if I want to
 create a new core named core01 and then check for the status and
 then insert a document to that index of core01, how do I do it?

 Any help or a document would help greatly.

 Thanks in advance.

 --
 Regards,

 Tharindu

   


-- 
http://jetwick.com twitter search prototype

Re: query between two date

2010-10-18 Thread Savvas-Andreas Moysidis

ok, I see now..well, the only query that comes to mind is something like:

check-in date:[2010-10-19T00:00:00Z TO *] AND check-out date:[* TO
2010-10-21T00:00:00Z]
would something like that work?

On 18 October 2010 11:04, nedaha neda...@gmail.com wrote:


 The exact query that i want is:

 check-in date = 2010-10-19T00:00:00Z
 AND check-out date = 2010-10-21T00:00:00Z

 but because of the structure that i have to index i don't have specific
 start date and end date in my solr index to compare with check-in and
 check-out date range. I have some dates that available to reserve!

 Could you please help me? :)


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/query-between-two-date-tp1718566p1724062.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: How can i get collect stemmed query?

2010-10-18 Thread Ahmet Arslan

rawquerystring = +body:flyaway
parsedquery = +body:fly +body:away

shows that your custom filter is working as you expected.

However you are using different tokenizers in query (standardtokenizer 
hard-coded) and index (whitespacetokenizer) time. That may cause numFound=0.  

For example if your indexed document contains 'fly, away' in its body field, 
your query won't return it. Because of comma. 

admin/analysis.jsp shows indexed tokens. 

You can issue a *:* query to see if that document really exists.
q=*:*fl=body

Your query analyzer definition should look like   :
analyzer type=query  
class=com.testsolr.ir.customAnalyzer.MyCustomQueryAnalyzer  /

you cannot have both an analyzer and a tokenizer at the same time.

Once you get this working, in your case it is better to write a custom filter 
factory plug-in and define query analyzer using it. ( for performance reason)
And you can load your plug-in easier : 
http://wiki.apache.org/solr/SolrPlugins#How_to_Load_Plugins

analyzer type=query
          tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.LengthFilterFactory/
        filter class=solr.LowerCaseFilterFactory/
filter class=com.testsolr.ir.KLTQueryStemFilter/
      /analyzer


--- On Mon, 10/18/10, Jerad ag...@naver.com wrote:

 From: Jerad ag...@naver.com
 Subject: Re: How can i get collect stemmed query?
 To: solr-user@lucene.apache.org
 Date: Monday, October 18, 2010, 12:14 PM
 
 Oops, I'm Sorry! I found some mistakes on previous posted
 source.( Main class
 name has been wrong :)
 
 This is the collect analyzer source.
 ---
 public class MyCustomQueryAnalyzer extends Analyzer{ 
     public static final Version LUCENE_VERSION =
 Version.LUCENE_29; 
     public static int QUERY_MIN_LEN_WORD_FILTER =
 1; 
     public static int QUERY_MAX_LEN_WORD_FILTER =
 40; 
         
     public int elapsedTime = 0; 
         
     @Override 
     public TokenStream tokenStream(String
 paramString, Reader reader) { 
         StandardTokenizer tokenizer =
 new StandardTokenizer( 
            
 du.utas.mcrdr.ir.lucene.WebDocIR.LUCENE_VERSION, reader ); 
 
         TokenStream tokenStream = new
 LengthFilter( tokenizer,
 QUERY_MIN_LEN_WORD_FILTER, 
          
    QUERY_MAX_LEN_WORD_FILTER ); 
         tokenStream = new
 LowerCaseFilter( tokenStream ); 
 
 
         //My custom stemmer method 
         MyCustomSingleWordStemmer
 stemer = new
 MyCustomSingleWordStemmer(QUERY_MIN_LEN_WORD_FILTER,
 QUERY_MAX_LEN_WORD_FILTER); 
 
         //My custom analyzer filter.
 this filter return sub-merged query. 
         //ex) INPUT : flyaway 
         // 
    RETURN VALUE : fly +body:away 
         tokenStream = new
 KLTQueryStemFilter( tokenStream, stemer, this ); 
 
         return tokenStream; 
     } 
 } 
 
 ---
 
 [Additional info]
 
 1. MyCustomQueryAnalyzer made outside of Solr.
     I made this analyzer outside of the solr
 package and make it to ~.jar
 and located at 
 
    
 ~/Solr/example/work/Jetty_0_0_0_0_8982_solr.war__solr__-2c5peu/webapp/WEB-INF/lib
 
 
 2. I edited field type and field name in scheme.xml which
 to be searched.
 
     field name=body type=textTp
 indexed=true stored=true
 omitNorms=true/
 
     fieldType name=textTp
 class=solr.TextField
       analyzer type=index
           tokenizer
 class=solr.WhitespaceTokenizerFactory/
         filter
 class=solr.LowerCaseFilterFactory/
       /analyzer
       analyzer type=query
 class=com.testsolr.ir.customAnalyzer.MyCustomQueryAnalyzer
         tokenizer
 class=solr.WhitespaceTokenizerFactory/
       /analyzer
     /fieldType
 
     This is my custom scheme.xml and custom
 search field type.
 
 3. I've got this xml result when I append
 debugQuery=on to my search url.
 
 
   ?xml version=1.0 encoding=UTF-8 ? 
 - response
 - lst name=responseHeader
   int name=status0/int 
   int name=QTime0/int 
 - lst name=params
   str name=debugQueryon/str 
   str name=indenton/str 
   str name=start0/str 
   str name=q+body:flyaway/str 
   str name=version2.2/str 
   str name=rows10/str 
   /lst
   /lst
   result name=response numFound=0 start=0
 / 
 - lst name=debug
   str
 name=rawquerystring+body:flyaway/str 
   str
 name=querystring+body:flyaway/str 
   str name=parsedquery+body:fly
 +body:away/str 
   str name=parsedquery_toString+body:fly
 +body:away/str 
   lst name=explain / 
   str name=QParserLuceneQParser/str
 
 - lst name=timing
   double name=time0.0/double 
 - lst name=prepare
   double name=time0.0/double 
 - lst
 name=org.apache.solr.handler.component.QueryComponent
   double name=time0.0/double 
   /lst
 - lst
 name=org.apache.solr.handler.component.FacetComponent
   double name=time0.0/double 
   /lst
 - lst

Re: Boosting documents based on the vote count

2010-10-18 Thread Ahmet Arslan

 I have a field in my schema which holds the number of votes
 a document
 has. How can I boost documents based on that number?

you can do it with http://wiki.apache.org/solr/FunctionQuery

Re: Boosting documents based on the vote count

2010-10-18 Thread Alexandru Badiu

I know but I can't figure out what functions to use. :)

On Mon, Oct 18, 2010 at 1:38 PM, Ahmet Arslan iori...@yahoo.com wrote:
 I have a field in my schema which holds the number of votes
 a document
 has. How can I boost documents based on that number?

 you can do it with http://wiki.apache.org/solr/FunctionQuery

Implementing Search Suggestion on Solr

2010-10-18 Thread Pablo Recio Quijano


Hi!

I'm trying to implement some kind of Search Suggestion on a search 
engine I have implemented. This search suggestions should not be 
automatically like the one described for the SpellCheckComponent [1]. 
I'm looking something like:


SAS oppositions = Public job offers for some-company

So I will have to define it manually. I was thinking about synonyms [2] 
but I don't know if it's the proper way to do it, because semantically 
those terms are not synonyms.


Any ideas or suggestions?

Regards,

[1] http://wiki.apache.org/solr/SpellCheckComponent
[2] 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

Re: Term is duplicated when updating a document

2010-10-18 Thread Thomas Kellerer


Thanks.

Not really the answer I wanted to hear, but at least I know this is not my 
fault ;)

Regards
Thomas

Erick Erickson, 15.10.2010 20:42:

This is actually known behavior. The problem is that when you update
a document, it's deleted and re-added, but the original is marked as
deleted. However, the terms aren't touched, both the original and the new
document's terms are counted. It'd be hard, very hard, to remove
the terms from the inverted index efficiently.

But when you optimize, all the deleted documents (and their assiociated
terms) are physically removed from the files, thus your term counts change.

HTH
Erick

On Fri, Oct 15, 2010 at 10:05 AM, Thomas Kellererspam_ea...@gmx.netwrote:


Thanks for the answer.


  Which fields are modified when the document is updated/replaced.




Only one field was changed, but it was not the one where the auto-suggest
term is coming from.


  Are there any differences in the content of the fields that you are using

for the AutoSuggest.


No


  Have you changed you schema.xml file recently? If you have, then there may

have been changes in the way these fields are analyzed and broken down to
terms.



No, I did a complete index rebuild to rule out things like that.
Then after startup, did a search, then updated the document and did a
search again.

Regards
Thomas




This may be a bug if you did not change the field or the schema file but
the
terms count is changing.

On Fri, Oct 15, 2010 at 9:14 AM, Thomas Kellererspam_ea...@gmx.net
  wrote:

  Hi,


we are updating our documents (that represent products in our shop) when
a
dealer modifies them, by calling
SolrServer.add(SolrInputDocument) with the updated document.

My understanding is, that there is no other way of updating an existing
document.


However we also use a term query to autocomplete the search field for the
user, but each time adocument is updated (added) the term count is
incremented. So after starting with a new index the count is e.g. 1, then
the document (that contains that term) is updated, and the count is 2,
the
next update will set this to 3 and so on.

One the index is optimized (by calling SolServer.optimize()) the count is
correct again.

Am I missing something or is this a bug in Solr/Lucene?

Thanks in advance
Thomas

Re: Virtual field, Statistics

2010-10-18 Thread Tanguy Moal

Hello Lance, thank you for your reply.

I created the following JIRA issue:
https://issues.apache.org/jira/browse/SOLR-2171, as suggested.

Can you tell me how new issues are handled by the development teams,
and whether there's a way I could help/contribute ?

--
Tanguy

2010/10/16 Lance Norskog goks...@gmail.com:
 Please add a JIRA issue requesting this. A bunch of things are not
 supported for functions: returning as a field value, for example.

 On Thu, Oct 14, 2010 at 8:31 AM, Tanguy Moal tanguy.m...@gmail.com wrote:
 Dear solr-user folks,

 I would like to use the stats module to perform very basic statistics
 (mean, min and max) which is actually working just fine.

 Nethertheless I found a little limitation that bothers me a tiny bit :
 how to perform the exact same statistics, but on the result of a
 function query rather than a field.

 Example :
 schema :
 - string : id
 - float : width
 - float : height
 - float : depth
 - string : color
 - float : price

 What I'd like to do is something like :
 select?price:[45.5 TO
 99.99]stats=onstats.facet=colorstats.field={volume=product(product(width,
 height), depth)}
 I would expect to obtain :

 lst name=stats
  lst name=stats_fields
  lst name=(product(product(width,height),depth))
   double name=min.../double
   double name=max.../double
   double name=sum.../double
   long name=count.../long
   long name=missing.../long
   double name=sumOfSquares.../double
   double name=mean.../double
   double name=stddev.../double
   lst name=facets
    lst name=color
     lst name=white
      double name=min.../double
      double name=max.../double
      double name=sum.../double
      long name=count.../long
      long name=missing.../long
      double name=sumOfSquares.../double
      double name=mean.../double
      double name=stddev.../double
    /lst
    lst name=red
      double name=min.../double
      double name=max.../double
      double name=sum.../double
      long name=count.../long
      long name=missing.../long
      double name=sumOfSquares.../double
      double name=mean.../double
      double name=stddev.../double
    /lst
    !-- Other facets on other colors go here --
   /lst!-- end of statistical facets on volumes --
  /lst!-- end of stats on volumes --
  /lst!-- end of stats_fields node --
 /lst

 Of course computing the volume can be performed before indexing data,
 but defining virtual fields on the fly given an arbitrary function is
 powerful and I am comfortable with the idea that many others would
 appreciate. Especially for BI needs and so on... :-D
 Is there a way to do it easily that I would have not been able to
 find, or is it actually impossible ?

 Thank you very much in advance for your help.

 --
 Tanguy




 --
 Lance Norskog
 goks...@gmail.com

Re: SOLR DateTime and SortableLongField field type problems

2010-10-18 Thread Ken Stanley

Just following up to see if anybody might have some words of wisdom on the
issue?

Thank you,

Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
-- Douglas Adams, The Hitchhikers Guide to the Galaxy


On Fri, Oct 15, 2010 at 6:42 PM, Ken Stanley doh...@gmail.com wrote:

 Hello all,

 I am using SOLR-1.4.1 with the DataImportHandler, and I am trying to follow
 the advice from
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg11887.htmlabout 
 converting date fields to SortableLong fields for better memory
 efficiency. However, whenever I try to do this using the DateFormater, I get
 exceptions when indexing for every row that tries to create my sortable
 fields.

 In my schema.xml, I have the following definitions for the fieldType and
 dynamicField:

 fieldType name=sdate class=solr.SortableLongField indexed=true
 stored=false sortMissingLast=true omitNorms=true /
 dynamicField name=sort_date_* type=sdate stored=false indexed=true
 /

 In my dih.xml, I have the following definitions:

 dataConfig
 dataSource type=FileDataSource encoding=UTF-8 /
 entity
 name=xml_stories
 rootEntity=false
 dataSource=null
 processor=FileListEntityProcessor
 fileName=legacy_stories.*\.xml$
 recursive=false
 baseDir=/usr/local/extracts
 newerThan=${dataimporter.xml_stories.last_index_time}
 
 entity
 name=stories
 pk=id
 dataSource=xml_stories
 processor=XPathEntityProcessor
 url=${xml_stories.fileAbsolutePath}
 forEach=/RECORDS/RECORD
 stream=true

 transformer=DateFormatTransformer,HTMLStripTransformer,RegexTransformer,TemplateTransformer
 onError=continue
 
 field column=_modified_date
 xpath=/RECORDS/RECORD/pr...@name='R_ModifiedTime']/PVAL /
 field column=modified_date
 sourceColName=_modified_date dateTimeFormat=-MM-dd'T'hh:mm:ss'Z' /

 field column=_df_date_published
 xpath=/RECORDS/RECORD/pr...@name='R_StoryDate']/PVAL /
 field column=df_date_published
 sourceColName=_df_date_published dateTimeFormat=-MM-dd'T'hh:mm:ss'Z'
 /

 field column=sort_date_modified
 sourceColName=modified_date dateTimeFormat=MMddhhmmss /
 field column=sort_date_published
 sourceColName=df_date_published dateTimeFormat=MMddhhmmss /
 /entity
 /entity
 /document
 /dataConfig

 The fields in question are in the formats:

 RECORDS
 RECORD
 PROP NAME=R_StoryDate
 PVAL2001-12-04T00:00:00Z/PVAL
 /PROP
 PROP NAME=R_ModifiedTime
 PVAL2001-12-04T19:38:01Z/PVAL
 /PROP
 /RECORD
 /RECORDS

 The exception that I am receiving is:

 Oct 15, 2010 6:23:24 PM
 org.apache.solr.handler.dataimport.DateFormatTransformer transformRow
 WARNING: Could not parse a Date field
 java.text.ParseException: Unparseable date: Wed Nov 28 21:39:05 EST 2007
 at java.text.DateFormat.parse(DateFormat.java:337)
 at
 org.apache.solr.handler.dataimport.DateFormatTransformer.process(DateFormatTransformer.java:89)
 at
 org.apache.solr.handler.dataimport.DateFormatTransformer.transformRow(DateFormatTransformer.java:69)
 at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.applyTransformer(EntityProcessorWrapper.java:195)
 at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:241)
 at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
 at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
 at
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
 at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
 at
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
 at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
 at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)

 I know that it has to be the SortableLong fields, because if I remove just
 those two lines from my dih.xml, everything imports as I expect it to. Am I
 doing something wrong? Mis-using the SortableLong and/or DateTransformer? Is
 this not supported in my version of SOLR? I'm not very experienced with
 Java, so digging into the code would be a lost cause for me right now. I was
 hoping that somebody here might be able to help point me in the
 right/correct direction.

 It should be noted that the modified_date and df_date_published fields
 index just fine (so long as I do it as I've defined above).

 Thank you,

 - Ken

 It looked like something resembling white marble, which was
 probably what it was:

Re: indexing mysql database

2010-10-18 Thread Erick Erickson

Also, the little-advertised DIH debug page can help, see:
solr/admin/dataimport.jsp

Best
Erick

On Sun, Oct 17, 2010 at 11:56 AM, William Pierce evalsi...@hotmail.comwrote:

 Two suggestions:  a) Noticed that your dih spec in the solrconfig.xml seems
 to to refer to db-data-config.xml but you said that your file was
 db-config.xml.   You may want to check this to make sure that your file
 names are correct.  b) what does your log say when you ran the import
 process?

 - Bill

 -Original Message- From: do3do3
 Sent: Sunday, October 17, 2010 8:29 AM
 To: solr-user@lucene.apache.org
 Subject: indexing mysql database



 i try to index table in mysql database,
 1st i create db-config.xml file which contain
 dataSource type=JdbcDataSource name=1stTrial
 Driver=com.mysql.jdbc.Driver encoding=UTF-8
 url=jdbc:mysql://localhost:3306/(database name)
 user=(user) password=(password) batchSize=-1/
 followed by
 entity dataSource=1stTrial name=(table name) pk=id query=select *
 from (table name)
 and defining of table like
 field column=id name=ID/
 field column=Text1 name=(field name)/
 2nd i add this field in schema.xml file
 and finally decide in solronfig.xml file the db-config.xml file as
 requestHandler name=/dataimport
 class=org.apache.solr.handler.dataimport.DataImportHandler
   lst name=defaults
   str name=configdb-data-config.xml/str
   /lst
  /requestHandler
 i found index folder which contain only segment.gen  segment_1 files
 and when try to search no result i got
 any body can present a help ???
 thanks in advance

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/indexing-mysql-database-tp1719883p1719883.html
 Sent from the Solr - User mailing list archive at Nabble.com.

RE: SOLR DateTime and SortableLongField field type problems

2010-10-18 Thread Michael Sokolov

I think if you look closely you'll find the date quoted in the Exception
report doesn't match any of the declared formats in the schema.  I would
suggest, as a first step, hunting through your data to see where that date
is coming from.

-Mike

 -Original Message-
 From: Ken Stanley [mailto:doh...@gmail.com] 
 Sent: Monday, October 18, 2010 7:40 AM
 To: solr-user@lucene.apache.org
 Subject: Re: SOLR DateTime and SortableLongField field type problems
 
 Just following up to see if anybody might have some words of 
 wisdom on the issue?
 
 Thank you,
 
 Ken
 
 It looked like something resembling white marble, which was 
 probably what it was: something resembling white marble.
 -- Douglas Adams, The Hitchhikers Guide to 
 the Galaxy
 
 
 On Fri, Oct 15, 2010 at 6:42 PM, Ken Stanley doh...@gmail.com wrote:
 
  Hello all,
 
  I am using SOLR-1.4.1 with the DataImportHandler, and I am 
 trying to 
  follow the advice from 
  
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg11887.htmla
  bout converting date fields to SortableLong fields for 
 better memory 
  efficiency. However, whenever I try to do this using the 
 DateFormater, 
  I get exceptions when indexing for every row that tries to 
 create my sortable fields.
 
  In my schema.xml, I have the following definitions for the 
 fieldType 
  and
  dynamicField:
 
  fieldType name=sdate class=solr.SortableLongField 
 indexed=true
  stored=false sortMissingLast=true omitNorms=true / 
  dynamicField name=sort_date_* type=sdate 
 stored=false indexed=true
  /
 
  In my dih.xml, I have the following definitions:
 
  dataConfig
  dataSource type=FileDataSource encoding=UTF-8 /
  entity
  name=xml_stories
  rootEntity=false
  dataSource=null
  processor=FileListEntityProcessor
  fileName=legacy_stories.*\.xml$
  recursive=false
  baseDir=/usr/local/extracts
  newerThan=${dataimporter.xml_stories.last_index_time}
  
  entity
  name=stories
  pk=id
  dataSource=xml_stories
  processor=XPathEntityProcessor
  url=${xml_stories.fileAbsolutePath}
  forEach=/RECORDS/RECORD
  stream=true
 
  
 transformer=DateFormatTransformer,HTMLStripTransformer,RegexT
 ransformer,TemplateTransformer
  onError=continue
  
  field column=_modified_date
  xpath=/RECORDS/RECORD/pr...@name='R_ModifiedTime']/PVAL /
  field column=modified_date
  sourceColName=_modified_date 
  dateTimeFormat=-MM-dd'T'hh:mm:ss'Z' /
 
  field column=_df_date_published
  xpath=/RECORDS/RECORD/pr...@name='R_StoryDate']/PVAL /
  field column=df_date_published
  sourceColName=_df_date_published 
 dateTimeFormat=-MM-dd'T'hh:mm:ss'Z'
  /
 
  field column=sort_date_modified
  sourceColName=modified_date dateTimeFormat=MMddhhmmss /
  field column=sort_date_published
  sourceColName=df_date_published dateTimeFormat=MMddhhmmss /
  /entity
  /entity
  /document
  /dataConfig
 
  The fields in question are in the formats:
 
  RECORDS
  RECORD
  PROP NAME=R_StoryDate
  PVAL2001-12-04T00:00:00Z/PVAL
  /PROP
  PROP NAME=R_ModifiedTime
  PVAL2001-12-04T19:38:01Z/PVAL
  /PROP
  /RECORD
  /RECORDS
 
  The exception that I am receiving is:
 
  Oct 15, 2010 6:23:24 PM
  org.apache.solr.handler.dataimport.DateFormatTransformer 
 transformRow
  WARNING: Could not parse a Date field
  java.text.ParseException: Unparseable date: Wed Nov 28 
 21:39:05 EST 2007
  at java.text.DateFormat.parse(DateFormat.java:337)
  at
  
 org.apache.solr.handler.dataimport.DateFormatTransformer.proce
 ss(DateFormatTransformer.java:89)
  at
  
 org.apache.solr.handler.dataimport.DateFormatTransformer.trans
 formRow(DateFormatTransformer.java:69)
  at
  
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.appl
 yTransformer(EntityProcessorWrapper.java:195)
  at
  
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.next
 Row(EntityProcessorWrapper.java:241)
  at
  
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(Do
 cBuilder.java:357)
  at
  
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(Do
 cBuilder.java:383)
  at
  
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBu
 ilder.java:242)
  at
  
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuild
 er.java:180)
  at
  
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(D
 ataImporter.java:331)
  at
  
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImp
 orter.java:389)
  at
  
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.jav
  a:370)
 
  I know that it has to be the SortableLong

Re: how can i use solrj binary format for indexing?

2010-10-18 Thread Jason, Kim


Hi, Gora
I haven't tried yet indexing huge amount of xml files through curl or pure
java(like a post.jar).
Indexing through xml is really fast?
How many files did you index? And How did it(using curl or pure java)?

Thanks, Gora
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-can-i-use-solrj-binary-format-for-indexing-tp1722612p1724645.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr requirements

2010-10-18 Thread Erick Erickson

Well, always get the biggest, fastest machine you can G...

On a serious note, you're right, there's not much info to go
on here. And even if there were more info, Solr performance
depends on how you search your data as well as how much
data you have...

About the only way you can really tell is to set your system
up and use the adminstatistics page to monitor your
system. In particular, monitor your cache evictions etc.

This page may also help:
http://wiki.apache.org/solr/SolrPerformanceFactors

Best
Erick

On Mon, Oct 18, 2010 at 5:59 AM, satya swaroop satya.yada...@gmail.comwrote:

 Hi All,
I am planning to have a separate server for solr and regarding
 hardware requirements i have a doubt about what configuration to be needed.
 I know it will be hard to tell but i just need a minimum requirement for
 the
 particular situation as follows::


 1) There are 1000 regular users using solr and Every day each user indexes
 10 files of 1KB each and totally it leads to a size of 10MB for a day and
 it
 goes on...???

 2)How much of RAM is used by solr in genral???

 Thanks,
 satya

Re: Virtual field, Statistics

2010-10-18 Thread Erick Erickson

The beauty/problem with open source is issues are picked up when
somebody  thinks they're important enough and has the time/energy
to work on it. And that person can be you G...

What usually happens is that someone submits a patch, various
people comment on it, look it over, ask for changes or provide
other feedback (e.g. Have you considered XYZ, or You
do realize that if we implement this patch, the universe
will end, don't you? G). Then, after a bunch of back-and
forths one of the committers decides that it's ready to be included
in the trunk and/or the branches.

The chances of the particular changed you need being included in
trunk go up dramatically if you provide a patch. And
keep pushing (gently) on the issue.

One tip, though. Before investing a lot of time and energy in
creating a patch, figure out how you expect to change the code
and ask some questions (via commenting on the
JIRA issue) about what you're thinking about doing. You'll often
get some really valuable feedback before investing lots of time...

See: http://wiki.apache.org/solr/HowToContribute for the details
of getting the source, compiling, running unit tests, setting
up your IDE, etc.

Best
Erick


On Mon, Oct 18, 2010 at 6:59 AM, Tanguy Moal tanguy.m...@gmail.com wrote:

 Hello Lance, thank you for your reply.

 I created the following JIRA issue:
 https://issues.apache.org/jira/browse/SOLR-2171, as suggested.

 Can you tell me how new issues are handled by the development teams,
 and whether there's a way I could help/contribute ?

 --
 Tanguy

 2010/10/16 Lance Norskog goks...@gmail.com:
  Please add a JIRA issue requesting this. A bunch of things are not
  supported for functions: returning as a field value, for example.
 
  On Thu, Oct 14, 2010 at 8:31 AM, Tanguy Moal tanguy.m...@gmail.com
 wrote:
  Dear solr-user folks,
 
  I would like to use the stats module to perform very basic statistics
  (mean, min and max) which is actually working just fine.
 
  Nethertheless I found a little limitation that bothers me a tiny bit :
  how to perform the exact same statistics, but on the result of a
  function query rather than a field.
 
  Example :
  schema :
  - string : id
  - float : width
  - float : height
  - float : depth
  - string : color
  - float : price
 
  What I'd like to do is something like :
  select?price:[45.5 TO
 
 99.99]stats=onstats.facet=colorstats.field={volume=product(product(width,
  height), depth)}
  I would expect to obtain :
 
  lst name=stats
   lst name=stats_fields
   lst name=(product(product(width,height),depth))
double name=min.../double
double name=max.../double
double name=sum.../double
long name=count.../long
long name=missing.../long
double name=sumOfSquares.../double
double name=mean.../double
double name=stddev.../double
lst name=facets
 lst name=color
  lst name=white
   double name=min.../double
   double name=max.../double
   double name=sum.../double
   long name=count.../long
   long name=missing.../long
   double name=sumOfSquares.../double
   double name=mean.../double
   double name=stddev.../double
 /lst
 lst name=red
   double name=min.../double
   double name=max.../double
   double name=sum.../double
   long name=count.../long
   long name=missing.../long
   double name=sumOfSquares.../double
   double name=mean.../double
   double name=stddev.../double
 /lst
 !-- Other facets on other colors go here --
/lst!-- end of statistical facets on volumes --
   /lst!-- end of stats on volumes --
   /lst!-- end of stats_fields node --
  /lst
 
  Of course computing the volume can be performed before indexing data,
  but defining virtual fields on the fly given an arbitrary function is
  powerful and I am comfortable with the idea that many others would
  appreciate. Especially for BI needs and so on... :-D
  Is there a way to do it easily that I would have not been able to
  find, or is it actually impossible ?
 
  Thank you very much in advance for your help.
 
  --
  Tanguy
 
 
 
 
  --
  Lance Norskog
  goks...@gmail.com

Re: Boosting documents based on the vote count

2010-10-18 Thread Ahmet Arslan

 I know but I can't figure out what
 functions to use. :)

Oh, I see. Why not just use {!boost b=log(vote)}?

May be scale(vote,0.5,10)?

Re: how can i use solrj binary format for indexing?

2010-10-18 Thread Gora Mohanty

On Mon, Oct 18, 2010 at 5:26 PM, Jason, Kim hialo...@gmail.com wrote:

 Hi, Gora
 I haven't tried yet indexing huge amount of xml files through curl or pure
 java(like a post.jar).
 Indexing through xml is really fast?
 How many files did you index? And How did it(using curl or pure java)?
[...]

We did it through curl. There were some 3.5 million XML files, and some
60 fields in the Solr schema, with minor tokenising, though with some
facets. A total of about 40GB of data. We used five Solr instances, and
five cores on each instance. From what I recall, it took 6h, though here
we might have well been limited by the read speed on a slow network
drive that held the data. If done in this way, one might need to merge the
data from the various cores, a task which took us about 1.5h.

Regards,
Gora

Re: solr requirements

2010-10-18 Thread satya swaroop

Hi,
   here is some more info about it. I use Solr to output only the file
names(file id's). Here i enclose the fields in my schema.xml and presently i
have only about 40MB of indexed data.


   field name=id type=string indexed=true stored=true
required=true /
   field name=sku type=textTight indexed=true stored=false
omitNorms=true/
   field name=name type=textgen indexed=true stored=false/

   field name=manu type=textgen indexed=true stored=false
omitNorms=true/
   field name=cat type=text_ws indexed=true stored=false
multiValued=true omitNorms=true /
   field name=features type=text indexed=true stored=false
multiValued=true/
   field name=includes type=text indexed=true stored=false
termVectors=true termPositions=true termOffsets=true /

   field name=weight type=float indexed=true stored=false/
   field name=price  type=float indexed=true stored=false/
   field name=popularity type=int indexed=true stored=false /
   field name=inStock type=boolean indexed=true stored=false /

   !--
   The following store examples are used to demonstrate the various ways one
might _CHOOSE_ to
implement spatial.  It is highly unlikely that you would ever have ALL
of these fields defined.
--
   field name=store type=location indexed=true stored=false/
   field name=store_lat_lon type=latLon indexed=true stored=false/
   field name=store_hash type=geohash indexed=true stored=false/


   !-- Common metadata fields, named specifically to match up with
 SolrCell metadata when parsing rich documents such as Word, PDF.
 Some fields are multiValued only because Tika currently may return
 multiple values for them.
   --
   field name=title type=text indexed=true stored=true
multiValued=true/
   field name=subject type=text indexed=true stored=false/
   field name=description type=text indexed=true stored=false/
   field name=comments type=text indexed=true stored=false/
   field name=author type=textgen indexed=true stored=false/
   field name=keywords type=textgen indexed=true stored=false/
   field name=category type=textgen indexed=true stored=false/
   field name=content_type type=string indexed=true stored=false
multiValued=true/
   field name=last_modified type=date indexed=true stored=false/
   field name=links type=string indexed=true stored=false
multiValued=true/
!-- added here content satya--
   field name=content type=spell indexed=true stored=false
multiValued=true/


   !-- catchall field, containing all other searchable text fields
(implemented
via copyField further on in this schema  --
   field name=text type=text indexed=true stored=false
multiValued=true termVectors=true/

   !-- catchall text field that indexes tokens both normally and in reverse
for efficient
leading wildcard queries.  here satya--
   field name=text_rev type=text_rev indexed=true stored=false
multiValued=true/

   !-- non-tokenized version of manufacturer to make it easier to sort or
group
results by manufacturer.  copied from manu via copyField here
satya--
   field name=manu_exact type=string indexed=true stored=false/
   field name=spell type=spell indexed=true stored=false
multiValued=true/
!-- heere changed --
   field name=payloads type=payloads indexed=true stored=false/

 field name=timestamp type=date indexed=true stored=false
default=NOW multiValued=false/



Regards,
satya

RE: query between two date

2010-10-18 Thread Jonathan Rochkind

Recommend using the pdate format for faster range queries. 

Here's how (or one way) to  do a range query in solr

defType=luceneq=some_field:[1995-12-31T23:59:59.999Z TO 2007-03-06T00:00:00Z]

Does that answer your question?  I don't really understand what you're trying 
to do with your two dates. You can of course combine range queries with 
operators with the standard/lucene query parser:

defType=luceneq=some_field:[1995-12-31T23:59:59.999Z TO 2007-03-06T00:00:00Z] 
AND other_field=[whatever TO whatever]

There are ways to make a query comparing the values of two fields too using 
function queries. But it's slighlty confusing and i'm not sure that's what you 
want to do, I'm not really sure what you want to do. Want to give an example of 
exactly what input you have (from your application), and what question you are 
trying to answer from your index?

From: nedaha [neda...@gmail.com]
Sent: Monday, October 18, 2010 5:05 AM
To: solr-user@lucene.apache.org
Subject: Re: query between two date

Thanks for your reply.
I know about the solr date format!! Check-in and Check-out dates are
user-friendly format that we use in our search form for system's users. and
i change the format via code and then send them to solr.
I want to know how can i make a query to compare a range between check-in
and check-out date with some separate different days that i have in solr
index.
for example:
check-in date is: 2010-10-19T00:00:00Z
and
check-out date is: 2010-10-21T00:00:00Z

when i want to build a query from my application i have a range date but in
solr index i have separate dates.
So how can i compare them to get the appropriate result?
--
View this message in context: 
http://lucene.472066.n3.nabble.com/query-between-two-date-tp1718566p1723752.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR DateTime and SortableLongField field type problems

2010-10-18 Thread Ken Stanley

On Mon, Oct 18, 2010 at 7:52 AM, Michael Sokolov soko...@ifactory.comwrote:

 I think if you look closely you'll find the date quoted in the Exception
 report doesn't match any of the declared formats in the schema.  I would
 suggest, as a first step, hunting through your data to see where that date
 is coming from.

 -Mike


[Note: RE-sending this because apparently in my sleepy-stupor, I clicked to
wrong Reply button and never sent this to the list (It's a Monday) :)]

I've noticed that date anomaly as well, and I've discovered that is one of
the gotchas of DIH: it seems to modify my date to that format. All of the
dates in the data are in the correct -MM-dd'T'hh:mm:ss'Z' format. Once
it is run through dateTImeFormat, I assume it is converted into a date
object; trying to use that date object in any other form (i.e., using
template, or even another dateTimeFormat) results in the exception I've
described (displaying the date in the incorrect format).

Thanks,

Ken Stanley

Re: API for using Multi cores with SolrJ

2010-10-18 Thread Tharindu Mathew

Thanks Peter. That helps a lot. It's weird that this not documented anywhere. :(

On Mon, Oct 18, 2010 at 3:42 PM, Peter Karich peat...@yahoo.de wrote:
 I asked this myself ... here could be some pointers:

 http://lucene.472066.n3.nabble.com/SolrJ-and-Multi-Core-Set-up-td1411235.html
 http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-in-Single-Core-td475238.html

 Hi everyone,

 I'm trying to write some code for creating and using multi cores.

 Is there a method available for this purpose or do I have to do a HTTP
 to a URL such as
 http://localhost:8983/solr/admin/cores?action=STATUScore=core0

 Is there an API available for this purpose. For example, if I want to
 create a new core named core01 and then check for the status and
 then insert a document to that index of core01, how do I do it?

 Any help or a document would help greatly.

 Thanks in advance.

 --
 Regards,

 Tharindu




 --
 http://jetwick.com twitter search prototype





-- 
Regards,

Tharindu

Re: API for using Multi cores with SolrJ

2010-10-18 Thread Ryan McKinley

On Mon, Oct 18, 2010 at 10:12 AM, Tharindu Mathew mcclou...@gmail.com wrote:
 Thanks Peter. That helps a lot. It's weird that this not documented anywhere. 
 :(

Feel free to edit the wiki :)

Re: how can i use solrj binary format for indexing?

2010-10-18 Thread Ryan McKinley

Do you already have the files as solr XML?  If so, I don't think you need solrj

If you need to build SolrInputDocuments from your existing structure,
solrj is a good choice.  If you are indexing lots of stuff, check the
StreamingUpdateSolrServer:
http://lucene.apache.org/solr/api/solrj/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html


On Sun, Oct 17, 2010 at 11:01 PM, Jason, Kim hialo...@gmail.com wrote:

 Hi all
 I have a huge amount of xml files for indexing.
 I want to index using solrj binary format to get performance gain.
 Because I heard that using xml files to index is quite slow.
 But I don't know how to use index through solrj binary format and can't find
 examples.
 Please give some help.
 Thanks,
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-can-i-use-solrj-binary-format-for-indexing-tp1722612p1722612.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: how can i use solrj binary format for indexing?

2010-10-18 Thread Jason, Kim


Thank you for reply, Gora

But I still have several questions.
Did you use separate index?
If so, you indexed 0.7 million Xml files per instance
and merged it. Is it Right?
Please let me know how to work multiple instances and cores in your case.

Regards,
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-can-i-use-solrj-binary-format-for-indexing-tp1722612p1725679.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Disable (or prohibit) per-field overrides

2010-10-18 Thread Jonathan Rochkind

You know about the 'invariant' that can be set in the request handler, 
right?  Not sure if that will do for you or not, but sounds related.


Added recnetly to some wiki page somewhere although the feature has been 
there for a long time.  Let's see if I can find the wiki page...Ah yes:


http://wiki.apache.org/solr/SearchHandler#Configuration

Markus Jelsma wrote:

Hi,

Thanks for the suggestion and pointer. We've implemented it using a single 
regex in Nginx for now. 


Cheers,

  

: Anyone knows useful method to disable or prohibit the per-field override
: features for the search components? If not, where to start to make it
: configurable via solrconfig and attempt to come up with a working patch?

If your goal is to prevent *clients* from specifying these (while you're
still allowed to use them in your defaults) then the simplest solution is
probably something external to Solr -- along the lines of mod_rewrite.

Internally...

that would be tough.

You could probably write a SearchComponent (configured to run first)
that does it fairly easily -- just wrap the SolrParams in an impl that
retuns null anytime a component asks for a param name that starts with
f. (and excludes those param names when asked for a list of the param
names)


It could probably be generalized to support arbitrary rules i na way
that might be handy for other folks, but it would still just be
wrapping all of hte params, so it would prevent you from using them
in your config as well.

Ultimatley i think a general solution would need to be in
RequestHandlerBase ... where it wraps the request params using the
defaults and invariants ... you'd want the custom exclusion rules to apply
only to the request params from the client.




-Hoss

Re: Disable (or prohibit) per-field overrides

2010-10-18 Thread Markus Jelsma

Thanks for your reply. But as i replied the following to Erick's suggestion 
which is quite the same:

  Yes, we're using it but the problem is that there can be many fields
  and that means quite a large list of parameters to set for each request
  handler, and there can be many request handlers.
  
  It's not very practical for us to maintain such big set of invariants.

It's much easier for us to maintain a very short white list than a huge black 
list.

Cheers

On Monday, October 18, 2010 04:59:09 pm Jonathan Rochkind wrote:
 You know about the 'invariant' that can be set in the request handler,
 right?  Not sure if that will do for you or not, but sounds related.
 
 Added recnetly to some wiki page somewhere although the feature has been
 there for a long time.  Let's see if I can find the wiki page...Ah yes:
 
 http://wiki.apache.org/solr/SearchHandler#Configuration
 
 Markus Jelsma wrote:
  Hi,
  
  Thanks for the suggestion and pointer. We've implemented it using a
  single regex in Nginx for now.
  
  Cheers,
  
  : Anyone knows useful method to disable or prohibit the per-field
  : override features for the search components? If not, where to start
  : to make it configurable via solrconfig and attempt to come up with a
  : working patch?
  
  If your goal is to prevent *clients* from specifying these (while you're
  still allowed to use them in your defaults) then the simplest solution
  is probably something external to Solr -- along the lines of
  mod_rewrite.
  
  Internally...
  
  that would be tough.
  
  You could probably write a SearchComponent (configured to run first)
  that does it fairly easily -- just wrap the SolrParams in an impl that
  retuns null anytime a component asks for a param name that starts with
  f. (and excludes those param names when asked for a list of the param
  names)
  
  
  It could probably be generalized to support arbitrary rules i na way
  that might be handy for other folks, but it would still just be
  wrapping all of hte params, so it would prevent you from using them
  in your config as well.
  
  Ultimatley i think a general solution would need to be in
  RequestHandlerBase ... where it wraps the request params using the
  defaults and invariants ... you'd want the custom exclusion rules to
  apply only to the request params from the client.
  
  
  
  
  -Hoss

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350

query pending commits?

2010-10-18 Thread Ryan McKinley

I have an indexing pipeline that occasionally needs to check if a
document is already in the index (even if not commited yet).

Any suggestions on how to do this without calling commit/ before each check?

I have a list of document ids and need to know which ones are in the
index (actually I need to know which ones are not in the index)  I
figured I would write a custome RequestHandler that would check the
main Reader and the UpdateHander reader, but it now looks like
'update' is handled directly within IndexWriter.

Any ideas?

thanks
ryan

Commits on service after shutdown

2010-10-18 Thread Ezequiel Calderara

 Hi, i'm new in the mailing list.
I'm implementing Solr in my actual job, and i'm having some problems.
I was testing the consistency of the commits. I found for example that if
we add X documents to the index (without commiting) and then we restart the
service, the documents are commited. They show up in the results. This is
interpreted to me like an error.
But when we add X documents to the index (without commiting) and then we
kill the process and we start it again, the documents doesn't appear. This
behaviour is the one i want.

Is there any param to avoid the auto-committing of documents after a
shutdown?
Is there any param to keep those un-commited documents alive after a kill?

Thanks!

-- 
__
Ezequiel.

Http://www.ironicnet.com http://www.ironicnet.com/

Re: Commits on service after shutdown

2010-10-18 Thread Israel Ekpo

The documents should be implicitly committed when the Lucene index is
closed.

When you perform a graceful shutdown, the Lucene index gets closed and the
documents get committed implicitly.

When the shutdown is abrupt as in a KILL -9, then this does not happen and
the updates are lost.

You can use the auto commit parameter when sending your updates so that the
changes are saved right away, thought this could slow down the indexing
speed considerably but I do not believe there are parameters to keep those
un-commited documents alive after a kill.



On Mon, Oct 18, 2010 at 2:46 PM, Ezequiel Calderara ezech...@gmail.comwrote:

  Hi, i'm new in the mailing list.
 I'm implementing Solr in my actual job, and i'm having some problems.
 I was testing the consistency of the commits. I found for example that if
 we add X documents to the index (without commiting) and then we restart the
 service, the documents are commited. They show up in the results. This is
 interpreted to me like an error.
 But when we add X documents to the index (without commiting) and then we
 kill the process and we start it again, the documents doesn't appear. This
 behaviour is the one i want.

 Is there any param to avoid the auto-committing of documents after a
 shutdown?
 Is there any param to keep those un-commited documents alive after a
 kill?

 Thanks!

 --
 __
 Ezequiel.

 Http://www.ironicnet.com http://www.ironicnet.com/




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/

RE: how can i use solrj binary format for indexing?

2010-10-18 Thread Sharp, Jonathan

Hi all
I have a huge amount of xml files for indexing.
I want to index using solrj binary format to get performance gain.
Because I heard that using xml files to index is quite slow.
But I don't know how to use index through solrj binary format and can't find 
examples.
Please give some help.
Thanks,

You might want to take a look at this section of the wiki too --
http://wiki.apache.org/solr/Solrj#Setting_the_RequestWriter

-Jon

-Original Message-
From: Jason, Kim [mailto:hialo...@gmail.com] 
Sent: Monday, October 18, 2010 7:52 AM
To: solr-user@lucene.apache.org
Subject: Re: how can i use solrj binary format for indexing?


Thank you for reply, Gora

But I still have several questions.
Did you use separate index?
If so, you indexed 0.7 million Xml files per instance
and merged it. Is it Right?
Please let me know how to work multiple instances and cores in your case.

Regards,
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-can-i-use-solrj-binary-format-for-indexing-tp1722612p1725679.html
Sent from the Solr - User mailing list archive at Nabble.com.


-
SECURITY/CONFIDENTIALITY WARNING:  
This message and any attachments are intended solely for the individual or 
entity to which they are addressed. This communication may contain information 
that is privileged, confidential, or exempt from disclosure under applicable 
law (e.g., personal health information, research data, financial information). 
Because this e-mail has been sent without encryption, individuals other than 
the intended recipient may be able to view the information, forward it to 
others or tamper with the information without the knowledge or consent of the 
sender. If you are not the intended recipient, or the employee or person 
responsible for delivering the message to the intended recipient, any 
dissemination, distribution or copying of the communication is strictly 
prohibited. If you received the communication in error, please notify the 
sender immediately by replying to this message and deleting the message and any 
accompanying files from your system. If, due to the security risks, you do not 
wish to receive further communications via e-mail, please reply to this message 
and inform the sender that you do not wish to receive further e-mail from the 
sender. 

-

ApacheCon Atlanta Meetup

2010-10-18 Thread Grant Ingersoll

Is there interest in having a Meetup at ApacheCon?  Who's going?  Would anyone 
like to present?  We could do something less formal, too, and just have drinks 
and QA/networking.  Thoughts?

-Grant

Spell checking question from a Solr novice

2010-10-18 Thread Xin Li

Hi, 

I am looking for a quick solution to improve a search engine's spell checking 
performance. I was wondering if anyone tried to integrate Google SpellCheck API 
with Solr search engine (if possible). Google spellcheck came to my mind 
because of two reasons. First, it is costly to clean up the data to be used as 
spell check baseline. Secondly, google probably has the most complete set of 
misspelled search terms. That's why I would like to know if it is a feasible 
way to go.

Thanks,
Xin
This electronic mail message contains information that (a) is or 
may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE 
PROTECTED 
BY LAW FROM DISCLOSURE, and (b) is intended only for the use of 
the
addressee(s) named herein.  If you are not an intended recipient, 
please contact the sender immediately and take the steps 
necessary 
to delete the message completely from your computer system.

Not Intended as a Substitute for a Writing: Notwithstanding the 
Uniform Electronic Transaction Act or any other law of similar 
effect, absent an express statement to the contrary, this e-mail 
message, its contents, and any attachments hereto are not 
intended 
to represent an offer or acceptance to enter into a contract and 
are not otherwise intended to bind this sender, 
barnesandnoble.com 
llc, barnesandnoble.com inc. or any other person or entity.

Re: Commits on service after shutdown

2010-10-18 Thread Ezequiel Calderara

I understand, but i want to have control of what is commit or not.
In our scenario, we want to add documents to the index, and maybe after an
hour trigger the commit.

If in the middle, we have a server shutdown or any process sending a
Shutdown signal to the process. I don't want those documents being commited.

Should i file a bug issue or an enhacement issue?.

Thanks


On Mon, Oct 18, 2010 at 3:54 PM, Israel Ekpo israele...@gmail.com wrote:

 The documents should be implicitly committed when the Lucene index is
 closed.

 When you perform a graceful shutdown, the Lucene index gets closed and the
 documents get committed implicitly.

 When the shutdown is abrupt as in a KILL -9, then this does not happen and
 the updates are lost.

 You can use the auto commit parameter when sending your updates so that the
 changes are saved right away, thought this could slow down the indexing
 speed considerably but I do not believe there are parameters to keep those
 un-commited documents alive after a kill.



 On Mon, Oct 18, 2010 at 2:46 PM, Ezequiel Calderara ezech...@gmail.com
 wrote:

   Hi, i'm new in the mailing list.
  I'm implementing Solr in my actual job, and i'm having some problems.
  I was testing the consistency of the commits. I found for example that
 if
  we add X documents to the index (without commiting) and then we restart
 the
  service, the documents are commited. They show up in the results. This is
  interpreted to me like an error.
  But when we add X documents to the index (without commiting) and then we
  kill the process and we start it again, the documents doesn't appear.
 This
  behaviour is the one i want.
 
  Is there any param to avoid the auto-committing of documents after a
  shutdown?
  Is there any param to keep those un-commited documents alive after a
  kill?
 
  Thanks!
 
  --
  __
  Ezequiel.
 
  Http://www.ironicnet.com http://www.ironicnet.com/ 
 http://www.ironicnet.com/
 



 --
 °O°
 Good Enough is not good enough.
 To give anything less than your best is to sacrifice the gift.
 Quality First. Measure Twice. Cut Once.
 http://www.israelekpo.com/




-- 
__
Ezequiel.

Http://www.ironicnet.com

Re: Commits on service after shutdown

2010-10-18 Thread Matthew Hall

 No.. you would just turn autocommit off, and have the thread that is 
doing updates to your indexes commit every hour.   I'd think that this 
would take care of the scenario that you are describing.


Matt

On 10/18/2010 3:50 PM, Ezequiel Calderara wrote:

I understand, but i want to have control of what is commit or not.
In our scenario, we want to add documents to the index, and maybe after an
hour trigger the commit.

If in the middle, we have a server shutdown or any process sending a
Shutdown signal to the process. I don't want those documents being commited.

Should i file a bug issue or an enhacement issue?.

Thanks


On Mon, Oct 18, 2010 at 3:54 PM, Israel Ekpoisraele...@gmail.com  wrote:


The documents should be implicitly committed when the Lucene index is
closed.

When you perform a graceful shutdown, the Lucene index gets closed and the
documents get committed implicitly.

When the shutdown is abrupt as in a KILL -9, then this does not happen and
the updates are lost.

You can use the auto commit parameter when sending your updates so that the
changes are saved right away, thought this could slow down the indexing
speed considerably but I do not believe there are parameters to keep those
un-commited documents alive after a kill.



On Mon, Oct 18, 2010 at 2:46 PM, Ezequiel Calderaraezech...@gmail.com

wrote:
  Hi, i'm new in the mailing list.
I'm implementing Solr in my actual job, and i'm having some problems.
I was testing the consistency of the commits. I found for example that

if

we add X documents to the index (without commiting) and then we restart

the

service, the documents are commited. They show up in the results. This is
interpreted to me like an error.
But when we add X documents to the index (without commiting) and then we
kill the process and we start it again, the documents doesn't appear.

This

behaviour is the one i want.

Is there any param to avoid the auto-committing of documents after a
shutdown?
Is there any param to keep those un-commited documents alive after a
kill?

Thanks!

--
__
Ezequiel.

Http://www.ironicnet.comhttp://www.ironicnet.com/  

http://www.ironicnet.com/


--
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/

Re: Commits on service after shutdown

2010-10-18 Thread Ezequiel Calderara

But if something happens in between that hour, i will have lost or committed
the documents to the index out of the schedule.

How can i handle this scenario?

I think that Solr (or Lucene) should make sure of the
durabilityhttp://en.wikipedia.org/wiki/Durability_(database_systems)of
the data even if its in an uncommited state.
On Mon, Oct 18, 2010 at 4:53 PM, Matthew Hall mh...@informatics.jax.orgwrote:

  No.. you would just turn autocommit off, and have the thread that is doing
 updates to your indexes commit every hour.   I'd think that this would take
 care of the scenario that you are describing.

 Matt


 On 10/18/2010 3:50 PM, Ezequiel Calderara wrote:

 I understand, but i want to have control of what is commit or not.
 In our scenario, we want to add documents to the index, and maybe after an
 hour trigger the commit.

 If in the middle, we have a server shutdown or any process sending a
 Shutdown signal to the process. I don't want those documents being
 commited.

 Should i file a bug issue or an enhacement issue?.

 Thanks


 On Mon, Oct 18, 2010 at 3:54 PM, Israel Ekpoisraele...@gmail.com
  wrote:

 The documents should be implicitly committed when the Lucene index is
 closed.

 When you perform a graceful shutdown, the Lucene index gets closed and
 the
 documents get committed implicitly.

 When the shutdown is abrupt as in a KILL -9, then this does not happen
 and
 the updates are lost.

 You can use the auto commit parameter when sending your updates so that
 the
 changes are saved right away, thought this could slow down the indexing
 speed considerably but I do not believe there are parameters to keep
 those
 un-commited documents alive after a kill.



 On Mon, Oct 18, 2010 at 2:46 PM, Ezequiel Calderaraezech...@gmail.com

 wrote:
  Hi, i'm new in the mailing list.
 I'm implementing Solr in my actual job, and i'm having some problems.
 I was testing the consistency of the commits. I found for example that

 if

 we add X documents to the index (without commiting) and then we restart

 the

 service, the documents are commited. They show up in the results. This
 is
 interpreted to me like an error.
 But when we add X documents to the index (without commiting) and then we
 kill the process and we start it again, the documents doesn't appear.

 This

 behaviour is the one i want.

 Is there any param to avoid the auto-committing of documents after a
 shutdown?
 Is there any param to keep those un-commited documents alive after a
 kill?

 Thanks!

 --
 __
 Ezequiel.

 Http://www.ironicnet.com http://www.ironicnet.com/
 http://www.ironicnet.com/  

 http://www.ironicnet.com/


 --
 °O°
 Good Enough is not good enough.
 To give anything less than your best is to sacrifice the gift.
 Quality First. Measure Twice. Cut Once.
 http://www.israelekpo.com/







-- 
__
Ezequiel.

Http://www.ironicnet.com

RE: Spell checking question from a Solr novice

2010-10-18 Thread Xin Li

Oops, never mind. Just read Google API policy. 1000 queries per day limit  for 
non-commercial use only. 

-Original Message-
From: Xin Li 
Sent: Monday, October 18, 2010 3:43 PM
To: solr-user@lucene.apache.org
Subject: Spell checking question from a Solr novice

Hi, 

I am looking for a quick solution to improve a search engine's spell checking 
performance. I was wondering if anyone tried to integrate Google SpellCheck API 
with Solr search engine (if possible). Google spellcheck came to my mind 
because of two reasons. First, it is costly to clean up the data to be used as 
spell check baseline. Secondly, google probably has the most complete set of 
misspelled search terms. That's why I would like to know if it is a feasible 
way to go.

Thanks,
Xin
This electronic mail message contains information that (a) is or 
may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE 
PROTECTED 
BY LAW FROM DISCLOSURE, and (b) is intended only for the use of 
the
addressee(s) named herein.  If you are not an intended recipient, 
please contact the sender immediately and take the steps 
necessary 
to delete the message completely from your computer system.

Not Intended as a Substitute for a Writing: Notwithstanding the 
Uniform Electronic Transaction Act or any other law of similar 
effect, absent an express statement to the contrary, this e-mail 
message, its contents, and any attachments hereto are not 
intended 
to represent an offer or acceptance to enter into a contract and 
are not otherwise intended to bind this sender, 
barnesandnoble.com 
llc, barnesandnoble.com inc. or any other person or entity.
This electronic mail message contains information that (a) is or 
may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE 
PROTECTED 
BY LAW FROM DISCLOSURE, and (b) is intended only for the use of 
the
addressee(s) named herein.  If you are not an intended recipient, 
please contact the sender immediately and take the steps 
necessary 
to delete the message completely from your computer system.

Not Intended as a Substitute for a Writing: Notwithstanding the 
Uniform Electronic Transaction Act or any other law of similar 
effect, absent an express statement to the contrary, this e-mail 
message, its contents, and any attachments hereto are not 
intended 
to represent an offer or acceptance to enter into a contract and 
are not otherwise intended to bind this sender, 
barnesandnoble.com 
llc, barnesandnoble.com inc. or any other person or entity.

Re: Commits on service after shutdown

2010-10-18 Thread Ezequiel Calderara

I'll see if i can resolve this adding an extra core with the same schema for
holding this documents.
So, Core0 will act as a Queue and the Core1 will be the real index. And
the commit in the core0 will trigger an add to the core1 and its commit.
That way i can be sure of not losing data.

It surprises me that solr doesn't have this feature built-in. I still have
to verify the perfomance, but looks good to me.

Anyway, any help would be appreciated.


On Mon, Oct 18, 2010 at 5:05 PM, Ezequiel Calderara ezech...@gmail.comwrote:

 But if something happens in between that hour, i will have lost or
 committed the documents to the index out of the schedule.

 How can i handle this scenario?

 I think that Solr (or Lucene) should make sure of the 
 durabilityhttp://en.wikipedia.org/wiki/Durability_(database_systems)of the 
 data even if its in an uncommited state.
   On Mon, Oct 18, 2010 at 4:53 PM, Matthew Hall mh...@informatics.jax.org
  wrote:

  No.. you would just turn autocommit off, and have the thread that is
 doing updates to your indexes commit every hour.   I'd think that this would
 take care of the scenario that you are describing.

 Matt


 On 10/18/2010 3:50 PM, Ezequiel Calderara wrote:

 I understand, but i want to have control of what is commit or not.
 In our scenario, we want to add documents to the index, and maybe after
 an
 hour trigger the commit.

 If in the middle, we have a server shutdown or any process sending a
 Shutdown signal to the process. I don't want those documents being
 commited.

 Should i file a bug issue or an enhacement issue?.

 Thanks


 On Mon, Oct 18, 2010 at 3:54 PM, Israel Ekpoisraele...@gmail.com
  wrote:

 The documents should be implicitly committed when the Lucene index is
 closed.

 When you perform a graceful shutdown, the Lucene index gets closed and
 the
 documents get committed implicitly.

 When the shutdown is abrupt as in a KILL -9, then this does not happen
 and
 the updates are lost.

 You can use the auto commit parameter when sending your updates so that
 the
 changes are saved right away, thought this could slow down the indexing
 speed considerably but I do not believe there are parameters to keep
 those
 un-commited documents alive after a kill.



 On Mon, Oct 18, 2010 at 2:46 PM, Ezequiel Calderaraezech...@gmail.com

 wrote:
  Hi, i'm new in the mailing list.
 I'm implementing Solr in my actual job, and i'm having some problems.
 I was testing the consistency of the commits. I found for example
 that

 if

 we add X documents to the index (without commiting) and then we restart

 the

 service, the documents are commited. They show up in the results. This
 is
 interpreted to me like an error.
 But when we add X documents to the index (without commiting) and then
 we
 kill the process and we start it again, the documents doesn't appear.

 This

 behaviour is the one i want.

 Is there any param to avoid the auto-committing of documents after a
 shutdown?
 Is there any param to keep those un-commited documents alive after a
 kill?

 Thanks!

 --
 __
 Ezequiel.

 Http://www.ironicnet.com http://www.ironicnet.com/
 http://www.ironicnet.com/  

 http://www.ironicnet.com/


 --
 °O°
 Good Enough is not good enough.
 To give anything less than your best is to sacrifice the gift.
 Quality First. Measure Twice. Cut Once.
 http://www.israelekpo.com/







 --
  __
 Ezequiel.

 Http://www.ironicnet.com http://www.ironicnet.com/




-- 
__
Ezequiel.

Http://www.ironicnet.com

Re: Spell checking question from a Solr novice

2010-10-18 Thread Jonathan Rochkind

In general, the benefit of the built-in Solr spellcheck is that it can 
use a dictionary based on your actual index.


If you want to use some external API, you certainly can, in your actual 
client app -- but it doesn't really need to involve Solr at all anymore, 
does it?  Is there any benefit I'm not thinking of to doing that on the 
solr side, instead of just in your client app?


I think Yahoo (and maybe Microsoft?) have similar APIs with more 
generous ToSs, but I haven't looked in a while.


Xin Li wrote:
Oops, never mind. Just read Google API policy. 1000 queries per day limit  for non-commercial use only. 




-Original Message-
From: Xin Li 
Sent: Monday, October 18, 2010 3:43 PM

To: solr-user@lucene.apache.org
Subject: Spell checking question from a Solr novice

Hi, 


I am looking for a quick solution to improve a search engine's spell checking 
performance. I was wondering if anyone tried to integrate Google SpellCheck API 
with Solr search engine (if possible). Google spellcheck came to my mind 
because of two reasons. First, it is costly to clean up the data to be used as 
spell check baseline. Secondly, google probably has the most complete set of 
misspelled search terms. That's why I would like to know if it is a feasible 
way to go.

Thanks,
Xin
This electronic mail message contains information that (a) is or 
may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE 
PROTECTED 
BY LAW FROM DISCLOSURE, and (b) is intended only for the use of 
the
addressee(s) named herein.  If you are not an intended recipient, 
please contact the sender immediately and take the steps 
necessary 
to delete the message completely from your computer system.


Not Intended as a Substitute for a Writing: Notwithstanding the 
Uniform Electronic Transaction Act or any other law of similar 
effect, absent an express statement to the contrary, this e-mail 
message, its contents, and any attachments hereto are not 
intended 
to represent an offer or acceptance to enter into a contract and 
are not otherwise intended to bind this sender, 
barnesandnoble.com 
llc, barnesandnoble.com inc. or any other person or entity.
This electronic mail message contains information that (a) is or 
may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE 
PROTECTED 
BY LAW FROM DISCLOSURE, and (b) is intended only for the use of 
the
addressee(s) named herein.  If you are not an intended recipient, 
please contact the sender immediately and take the steps 
necessary 
to delete the message completely from your computer system.


Not Intended as a Substitute for a Writing: Notwithstanding the 
Uniform Electronic Transaction Act or any other law of similar 
effect, absent an express statement to the contrary, this e-mail 
message, its contents, and any attachments hereto are not 
intended 
to represent an offer or acceptance to enter into a contract and 
are not otherwise intended to bind this sender, 
barnesandnoble.com 
llc, barnesandnoble.com inc. or any other person or entity.

Re: Spell checking question from a Solr novice

2010-10-18 Thread Pradeep Singh

I think a spellchecker based on your index has clear advantages. You can
spellcheck words specific to your domain which may not be available in an
outside dictionary. You can always dump the list from wordnet to get a
starter english dictionary.

But then it also means that misspelled words from your domain become the
suggested correct word. Hmmm ... you'll need to have a way to prune out such
words. Even then, your own domain based dictionary is a total go.

On Mon, Oct 18, 2010 at 1:55 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 In general, the benefit of the built-in Solr spellcheck is that it can use
 a dictionary based on your actual index.

 If you want to use some external API, you certainly can, in your actual
 client app -- but it doesn't really need to involve Solr at all anymore,
 does it?  Is there any benefit I'm not thinking of to doing that on the solr
 side, instead of just in your client app?

 I think Yahoo (and maybe Microsoft?) have similar APIs with more generous
 ToSs, but I haven't looked in a while.


 Xin Li wrote:

 Oops, never mind. Just read Google API policy. 1000 queries per day limit
  for non-commercial use only.


 -Original Message-
 From: Xin Li Sent: Monday, October 18, 2010 3:43 PM
 To: solr-user@lucene.apache.org
 Subject: Spell checking question from a Solr novice

 Hi,
 I am looking for a quick solution to improve a search engine's spell
 checking performance. I was wondering if anyone tried to integrate Google
 SpellCheck API with Solr search engine (if possible). Google spellcheck came
 to my mind because of two reasons. First, it is costly to clean up the data
 to be used as spell check baseline. Secondly, google probably has the most
 complete set of misspelled search terms. That's why I would like to know if
 it is a feasible way to go.

 Thanks,
 Xin
 This electronic mail message contains information that (a) is or may be
 CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM
 DISCLOSURE, and (b) is intended only for the use of the
 addressee(s) named herein.  If you are not an intended recipient, please
 contact the sender immediately and take the steps necessary to delete the
 message completely from your computer system.

 Not Intended as a Substitute for a Writing: Notwithstanding the Uniform
 Electronic Transaction Act or any other law of similar effect, absent an
 express statement to the contrary, this e-mail message, its contents, and
 any attachments hereto are not intended to represent an offer or acceptance
 to enter into a contract and are not otherwise intended to bind this sender,
 barnesandnoble.com llc, barnesandnoble.com inc. or any other person or
 entity.
 This electronic mail message contains information that (a) is or may be
 CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM
 DISCLOSURE, and (b) is intended only for the use of the
 addressee(s) named herein.  If you are not an intended recipient, please
 contact the sender immediately and take the steps necessary to delete the
 message completely from your computer system.

 Not Intended as a Substitute for a Writing: Notwithstanding the Uniform
 Electronic Transaction Act or any other law of similar effect, absent an
 express statement to the contrary, this e-mail message, its contents, and
 any attachments hereto are not intended to represent an offer or acceptance
 to enter into a contract and are not otherwise intended to bind this sender,
 barnesandnoble.com llc, barnesandnoble.com inc. or any other person or
 entity.

Re: Spell checking question from a Solr novice

2010-10-18 Thread Jason Blackerby

If you know the misspellings you could prevent them from being added to the
dictionary with a StopFilterFactory like so:

fieldType name=textSpell class=solr.TextField
positionIncrementGap=100 
  analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=misspelled_words.txt/
filter class=solr.PatternReplaceFilterFactory pattern=([^a-z])
replacement= replace=all/
filter class=solr.LengthFilterFactory min=2 max=50/
  /analyzer
/fieldType

where misspelled_words.txt contains the misspellings.

On Mon, Oct 18, 2010 at 5:14 PM, Pradeep Singh pksing...@gmail.com wrote:

 I think a spellchecker based on your index has clear advantages. You can
 spellcheck words specific to your domain which may not be available in an
 outside dictionary. You can always dump the list from wordnet to get a
 starter english dictionary.

 But then it also means that misspelled words from your domain become the
 suggested correct word. Hmmm ... you'll need to have a way to prune out
 such
 words. Even then, your own domain based dictionary is a total go.

 On Mon, Oct 18, 2010 at 1:55 PM, Jonathan Rochkind rochk...@jhu.edu
 wrote:

  In general, the benefit of the built-in Solr spellcheck is that it can
 use
  a dictionary based on your actual index.
 
  If you want to use some external API, you certainly can, in your actual
  client app -- but it doesn't really need to involve Solr at all anymore,
  does it?  Is there any benefit I'm not thinking of to doing that on the
 solr
  side, instead of just in your client app?
 
  I think Yahoo (and maybe Microsoft?) have similar APIs with more generous
  ToSs, but I haven't looked in a while.
 
 
  Xin Li wrote:
 
  Oops, never mind. Just read Google API policy. 1000 queries per day
 limit
   for non-commercial use only.
 
 
  -Original Message-
  From: Xin Li Sent: Monday, October 18, 2010 3:43 PM
  To: solr-user@lucene.apache.org
  Subject: Spell checking question from a Solr novice
 
  Hi,
  I am looking for a quick solution to improve a search engine's spell
  checking performance. I was wondering if anyone tried to integrate
 Google
  SpellCheck API with Solr search engine (if possible). Google spellcheck
 came
  to my mind because of two reasons. First, it is costly to clean up the
 data
  to be used as spell check baseline. Secondly, google probably has the
 most
  complete set of misspelled search terms. That's why I would like to know
 if
  it is a feasible way to go.
 
  Thanks,
  Xin
  This electronic mail message contains information that (a) is or may be
  CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM
  DISCLOSURE, and (b) is intended only for the use of the
  addressee(s) named herein.  If you are not an intended recipient, please
  contact the sender immediately and take the steps necessary to delete
 the
  message completely from your computer system.
 
  Not Intended as a Substitute for a Writing: Notwithstanding the Uniform
  Electronic Transaction Act or any other law of similar effect, absent an
  express statement to the contrary, this e-mail message, its contents,
 and
  any attachments hereto are not intended to represent an offer or
 acceptance
  to enter into a contract and are not otherwise intended to bind this
 sender,
  barnesandnoble.com llc, barnesandnoble.com inc. or any other person or
  entity.
  This electronic mail message contains information that (a) is or may be
  CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM
  DISCLOSURE, and (b) is intended only for the use of the
  addressee(s) named herein.  If you are not an intended recipient, please
  contact the sender immediately and take the steps necessary to delete
 the
  message completely from your computer system.
 
  Not Intended as a Substitute for a Writing: Notwithstanding the Uniform
  Electronic Transaction Act or any other law of similar effect, absent an
  express statement to the contrary, this e-mail message, its contents,
 and
  any attachments hereto are not intended to represent an offer or
 acceptance
  to enter into a contract and are not otherwise intended to bind this
 sender,
  barnesandnoble.com llc, barnesandnoble.com inc. or any other person or
  entity.

Schema required?

2010-10-18 Thread Frank Calfo

We need to index documents where the fields in the document can change 
frequently.

It appears that we would need to update our Solr schema definition before we 
can reindex using new fields.

Is there any way to make the Solr schema optional?



--frank

I need to indexing the first character of a field in another field

2010-10-18 Thread Renato Wesenauer

Hello guys,

I need to indexing the first character of the field autor in another field
inicialautor.
Example:
   autor = Mark Webber
   inicialautor = M

I did a javascript function in the dataimport, but the field  inicialautor
indexing empty.

The function:

function InicialAutor(linha) {
var aut = linha.get(autor);
if (aut != null) {
  if (aut.length  0) {
  var ch = aut.charAt(0);
  linha.put(inicialautor, ch);
  }
  else {
  linha.put(inicialautor, '');
  }
}
else {
linha.put(inicialautor, '');
}
return linha;
}

What's wrong?

Thank's,

Renato Wesenauer

RE: Schema required?

2010-10-18 Thread Tim Gilbert

Hi Frank,

Check out the Dynamic Fields option from here
http://wiki.apache.org/solr/SchemaXml

Tim

-Original Message-
From: Frank Calfo [mailto:fca...@aravo.com] 
Sent: Monday, October 18, 2010 5:25 PM
To: solr-user@lucene.apache.org
Subject: Schema required?

We need to index documents where the fields in the document can change
frequently.

It appears that we would need to update our Solr schema definition
before we can reindex using new fields.

Is there any way to make the Solr schema optional?



--frank

Admin for spellchecker?

2010-10-18 Thread Pradeep Singh

Do we need an admin screen for spellchecker? Where you can browse the words
and delete the ones you don't like so that they don't get suggested?

Re: Spell checking question from a Solr novice

2010-10-18 Thread Ezequiel Calderara

You can cross the new words against a dictionary and keep them in the file
as Jason described...

What Pradeep said is true, is always better to have suggestions related to
your index that have suggestions with no results...


On Mon, Oct 18, 2010 at 6:24 PM, Jason Blackerby jblacke...@gmail.comwrote:

 If you know the misspellings you could prevent them from being added to the
 dictionary with a StopFilterFactory like so:

fieldType name=textSpell class=solr.TextField
 positionIncrementGap=100 
  analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=misspelled_words.txt/
filter class=solr.PatternReplaceFilterFactory pattern=([^a-z])
 replacement= replace=all/
filter class=solr.LengthFilterFactory min=2 max=50/
  /analyzer
/fieldType

 where misspelled_words.txt contains the misspellings.

 On Mon, Oct 18, 2010 at 5:14 PM, Pradeep Singh pksing...@gmail.com
 wrote:

  I think a spellchecker based on your index has clear advantages. You can
  spellcheck words specific to your domain which may not be available in an
  outside dictionary. You can always dump the list from wordnet to get a
  starter english dictionary.
 
  But then it also means that misspelled words from your domain become the
  suggested correct word. Hmmm ... you'll need to have a way to prune out
  such
  words. Even then, your own domain based dictionary is a total go.
 
  On Mon, Oct 18, 2010 at 1:55 PM, Jonathan Rochkind rochk...@jhu.edu
  wrote:
 
   In general, the benefit of the built-in Solr spellcheck is that it can
  use
   a dictionary based on your actual index.
  
   If you want to use some external API, you certainly can, in your actual
   client app -- but it doesn't really need to involve Solr at all
 anymore,
   does it?  Is there any benefit I'm not thinking of to doing that on the
  solr
   side, instead of just in your client app?
  
   I think Yahoo (and maybe Microsoft?) have similar APIs with more
 generous
   ToSs, but I haven't looked in a while.
  
  
   Xin Li wrote:
  
   Oops, never mind. Just read Google API policy. 1000 queries per day
  limit
for non-commercial use only.
  
  
   -Original Message-
   From: Xin Li Sent: Monday, October 18, 2010 3:43 PM
   To: solr-user@lucene.apache.org
   Subject: Spell checking question from a Solr novice
  
   Hi,
   I am looking for a quick solution to improve a search engine's spell
   checking performance. I was wondering if anyone tried to integrate
  Google
   SpellCheck API with Solr search engine (if possible). Google
 spellcheck
  came
   to my mind because of two reasons. First, it is costly to clean up the
  data
   to be used as spell check baseline. Secondly, google probably has the
  most
   complete set of misspelled search terms. That's why I would like to
 know
  if
   it is a feasible way to go.
  
   Thanks,
   Xin
   This electronic mail message contains information that (a) is or may
 be
   CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW
 FROM
   DISCLOSURE, and (b) is intended only for the use of the
   addressee(s) named herein.  If you are not an intended recipient,
 please
   contact the sender immediately and take the steps necessary to delete
  the
   message completely from your computer system.
  
   Not Intended as a Substitute for a Writing: Notwithstanding the
 Uniform
   Electronic Transaction Act or any other law of similar effect, absent
 an
   express statement to the contrary, this e-mail message, its contents,
  and
   any attachments hereto are not intended to represent an offer or
  acceptance
   to enter into a contract and are not otherwise intended to bind this
  sender,
   barnesandnoble.com llc, barnesandnoble.com inc. or any other person
 or
   entity.
   This electronic mail message contains information that (a) is or may
 be
   CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW
 FROM
   DISCLOSURE, and (b) is intended only for the use of the
   addressee(s) named herein.  If you are not an intended recipient,
 please
   contact the sender immediately and take the steps necessary to delete
  the
   message completely from your computer system.
  
   Not Intended as a Substitute for a Writing: Notwithstanding the
 Uniform
   Electronic Transaction Act or any other law of similar effect, absent
 an
   express statement to the contrary, this e-mail message, its contents,
  and
   any attachments hereto are not intended to represent an offer or
  acceptance
   to enter into a contract and are not otherwise intended to bind this
  sender,
   barnesandnoble.com llc, barnesandnoble.com inc. or any other person
 or
   entity.
  
  
  
 




-- 
__
Ezequiel.

Http://www.ironicnet.com

Re: I need to indexing the first character of a field in another field

2010-10-18 Thread Ezequiel Calderara

How are you declaring the transformer in the dataconfig?

On Mon, Oct 18, 2010 at 6:31 PM, Renato Wesenauer 
renato.wesena...@gmail.com wrote:

 Hello guys,

 I need to indexing the first character of the field autor in another
 field
 inicialautor.
 Example:
   autor = Mark Webber
   inicialautor = M

 I did a javascript function in the dataimport, but the field  inicialautor
 indexing empty.

 The function:

function InicialAutor(linha) {
var aut = linha.get(autor);
if (aut != null) {
  if (aut.length  0) {
  var ch = aut.charAt(0);
  linha.put(inicialautor, ch);
  }
  else {
  linha.put(inicialautor, '');
  }
}
else {
linha.put(inicialautor, '');
}
return linha;
}

 What's wrong?

 Thank's,

 Renato Wesenauer




-- 
__
Ezequiel.

Http://www.ironicnet.com

Re: I need to indexing the first character of a field in another field

2010-10-18 Thread Pradeep Singh

You can use regular expression based template transformer without writing a
separate function. It's pretty easy to use.

On Mon, Oct 18, 2010 at 2:31 PM, Renato Wesenauer 
renato.wesena...@gmail.com wrote:

 Hello guys,

 I need to indexing the first character of the field autor in another
 field
 inicialautor.
 Example:
   autor = Mark Webber
   inicialautor = M

 I did a javascript function in the dataimport, but the field  inicialautor
 indexing empty.

 The function:

function InicialAutor(linha) {
var aut = linha.get(autor);
if (aut != null) {
  if (aut.length  0) {
  var ch = aut.charAt(0);
  linha.put(inicialautor, ch);
  }
  else {
  linha.put(inicialautor, '');
  }
}
else {
linha.put(inicialautor, '');
}
return linha;
}

 What's wrong?

 Thank's,

 Renato Wesenauer

Re: Admin for spellchecker?

2010-10-18 Thread Ezequiel Calderara

i was thinking about, you also would need to mark a word like valid, so it
doesn't mark it as wrong.


On Mon, Oct 18, 2010 at 6:37 PM, Pradeep Singh pksing...@gmail.com wrote:

 Do we need an admin screen for spellchecker? Where you can browse the words
 and delete the ones you don't like so that they don't get suggested?




-- 
__
Ezequiel.

Http://www.ironicnet.com

Re: Schema required?

2010-10-18 Thread Jonathan Rochkind


Frank Calfo wrote:

We need to index documents where the fields in the document can change 
frequently.

It appears that we would need to update our Solr schema definition before we 
can reindex using new fields.

Is there any way to make the Solr schema optional?
  
No. But you can design your schema more flexibly than you are designing 
it.  Design it in a more abstract way, so it doesn't in fact need to 
change when external factors change.


I mean, every time you change your schema you are going to have to 
change any client applications that use your solr index to look things 
up using new fields and such too, right? You don't want to go changing 
your schema all the time. You want to design your schema so it doesn't 
need to change.


Solr is not an rdbms. You do not need to 'normalize' your data, or 
design your schema in the same way you would an rdbms. Design your 
schema to feed your actual and potential client apps.



Jonathan

Re: I need to indexing the first character of a field in another field

2010-10-18 Thread Jonathan Rochkind

You can just do this with a copyfield in your schema.xml instead.  Copy 
to a field which uses regexpfilter or some other analyzer to limit to 
first non-whitespace char (and perhaps force upcase too if you want). 
That's what I'd do, easier and will work if you index to Solr from 
something other than dataimport as well.


Renato Wesenauer wrote:

Hello guys,

I need to indexing the first character of the field autor in another field
inicialautor.
Example:
   autor = Mark Webber
   inicialautor = M

I did a javascript function in the dataimport, but the field  inicialautor
indexing empty.

The function:

function InicialAutor(linha) {
var aut = linha.get(autor);
if (aut != null) {
  if (aut.length  0) {
  var ch = aut.charAt(0);
  linha.put(inicialautor, ch);
  }
  else {
  linha.put(inicialautor, '');
  }
}
else {
linha.put(inicialautor, '');
}
return linha;
}

What's wrong?

Thank's,

Renato Wesenauer

Re: I need to indexing the first character of a field in another field

2010-10-18 Thread Chris Hostetter


This exact topic was just discussed a few days ago...

http://search.lucidimagination.com/search/document/7b6e2cc37bbb95c8/faceting_and_first_letter_of_fields#3059a28929451cb4

My comments on when/where it makes sense to put this logic...

http://search.lucidimagination.com/search/document/7b6e2cc37bbb95c8/faceting_and_first_letter_of_fields#7b6e2cc37bbb95c8


: Date: Mon, 18 Oct 2010 19:31:28 -0200
: From: Renato Wesenauer renato.wesena...@gmail.com
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: I need to indexing the first character of a field in another field
: 
: Hello guys,
: 
: I need to indexing the first character of the field autor in another field
: inicialautor.
: Example:
:autor = Mark Webber
:inicialautor = M
: 
: I did a javascript function in the dataimport, but the field  inicialautor
: indexing empty.
: 
: The function:
: 
: function InicialAutor(linha) {
: var aut = linha.get(autor);
: if (aut != null) {
:   if (aut.length  0) {
:   var ch = aut.charAt(0);
:   linha.put(inicialautor, ch);
:   }
:   else {
:   linha.put(inicialautor, '');
:   }
: }
: else {
: linha.put(inicialautor, '');
: }
: return linha;
: }
: 
: What's wrong?
: 
: Thank's,
: 
: Renato Wesenauer
: 

-Hoss

Removing Common Web Page Header and Footer from All Content Fetched by Nutch

2010-10-18 Thread Israel Ekpo

Hi All,

I am indexing a web application with approximately 9500 distinct URL and
contents using Nutch and Solr.

I use Nutch to fetch the urls, links and the crawl the entire web
application to extract all the content for all pages.

Then I run the solrindex command to send the content to Solr.

The problem that I have now is that the first 1000 or so characters of some
pages and the last 400 characters of the pages are showing up in the search
results.

These are contents of the common header and footer used in the site
respectively.

The only work around that I have now is to index everything and then go
through each document one at a time to remove the first 1000 characters if
the levenshtein distance between the first 1000 characters of the page and
the common header is less than a certain value. Same applies to the footer
content common to all pages.

Is there a way to ignore certain stop phrase so to speak in the Nutch
configuration based on levenshtein distance or jaro winkler distance so that
certain parts of the fetched data that matches this stop phrases will not be
parsed?

Any useful pointers would be highly appreciated.

Thanks in advance.


-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/

Re: How can i get collect stemmed query?

2010-10-18 Thread Jerad


Thanks for your reply :)

1. I tested that q=*:*fl=body , 1 doc returned as result as I expected.

2. I'm edit my scheme.xml as you instructed. 

analyzer type=query
class=com.testsolr.ir.customAnalyzer.MyCustomQueryAnalyzer 
//No filter description.
/analyzer 

but no result returned.

3. I wonder that...

Tipically Tokenizer and filter flow was

1) Input stream provide text stream to tokenizer or filter.
2) tokenizer or filter get a token, and processed token and offset
attribute info has returned.
3) offset attributes has the infomation of token's.

This is a part of tipical filter src that I thought.
   

public class CustomStemFilter extends TokenFilter {

private MyCustomStemer stemmer;
private TermAttribute termAttr;
private OffsetAttribute offsetAttr;
private TypeAttribute typeAttr;
private HashtableString,String reserved = new
HashtableString,String();

public CustomStemFilter( TokenStream tokenStream, boolean 
isQuery,
MyCustomStemer stemmer ){
super( tokenStream );

this.stemmer = stemmer;
termAttr   = (TermAttribute) 
addAttribute(TermAttribute.class);   
offsetAttr = (OffsetAttribute)
addAttribute(OffsetAttribute.class);   
typeAttr   = (TypeAttribute)
addAttribute(TypeAttribute.class);   
addAttribute(PositionIncrementAttribute.class);

//Some of my custom logic here.
//do something.
}

private MyCustomStemmer stemmer = new MyCustomStemmer();

public boolean incrementToken() throws IOException {
clearAttributes();

if (!input.incrementToken())
return false;

StringBuffer queryBuffer = new StringBuffer();

//stemming logic here.
//generated query string has append to queryBuffer.

termAttr.setTermBuffer(queryBuffer.toString(), 0,
queryBuffer.length());
offsetAttr.setOffset(0, queryBuffer.length());
offSet += queryBuffer.length();
typeAttr.setType(word);

return true;
}
}
   


※ MyCustomStemmer analyze input string flyaway to query string :
fly +body:away
   and return it.

At index time, contents to be searched is normally analyzed and
indexed as below.

a) Contents to be indexed : fly away
b) Token fly and length of fly = 3(Has been setup by offset
attribute method) 
   has returned by filter or analyzer.
c) Next token away and length of away = 4 has returned.

I think it's a general index flow.

But, I customized MyCustomFilter that filter generate query string,
not a token.
In the process, offset value has changed : query's length, not a
single token's length.

I wonder that value to be set up by offsetAttr.setOffset() method 
has influence on search result on using solr? 
(I tested this on main page's query input box at
http://localhost:8983/solr/admin/ )


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-can-i-get-collect-search-result-from-custom-filtered-query-tp1723055p1729717.html
Sent from the Solr - User mailing list archive at Nabble.com.

Setting solr home directory in websphere

2010-10-18 Thread Kevin Cunningham

I've installed Solr a hundred times using Tomcat (on Windows) but now need to 
get it going with WebSphere (on Windows).  For whatever reason this seems to be 
black magic :)  I've installed the war file but have no idea how to set Solr 
home to let WebSphere know where the index and config files are.  Can someone 
enlighten me on how to do this please?

Re: Setting solr home directory in websphere

2010-10-18 Thread Israel Ekpo

You need to make sure that the following system variable is one of the
values specific in the JAVA_OPTS environment variable

-Dsolr.solr.home=path_to_solr_home



On Mon, Oct 18, 2010 at 10:20 PM, Kevin Cunningham 
kcunning...@telligent.com wrote:

 I've installed Solr a hundred times using Tomcat (on Windows) but now need
 to get it going with WebSphere (on Windows).  For whatever reason this seems
 to be black magic :)  I've installed the war file but have no idea how to
 set Solr home to let WebSphere know where the index and config files are.
  Can someone enlighten me on how to do this please?




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/

snapshot-4.0 and maven

2010-10-18 Thread Matt Mitchell

I'd like to get solr snapshot-4.0 pushed into my local maven repo. Is
this possible to do? If so, could someone give me a tip or two on
getting started?

Thanks,
Matt

Re: snapshot-4.0 and maven

2010-10-18 Thread Tommy Chheng

 Once you built the solr 4.0 jar, you can use mvn's install command 
like this:


mvn install:install-file -DgroupId=org.apache -DartifactId=solr 
-Dpackaging=jar -Dversion=4.0-SNAPSHOT -Dfile=solr-4.0-SNAPSHOT.jar 
-DgeneratePom=true


@tommychheng


On 10/18/10 7:28 PM, Matt Mitchell wrote:

I'd like to get solr snapshot-4.0 pushed into my local maven repo. Is
this possible to do? If so, could someone give me a tip or two on
getting started?

Thanks,
Matt

Re: Spell checking question from a Solr novice

2010-10-18 Thread Dennis Gearon

The first question to ask is will it work for you.

The SECOND question is do  you want google to know what's in your data?

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Mon, 10/18/10, Xin Li x...@book.com wrote:

 From: Xin Li x...@book.com
 Subject: Spell checking question from a Solr novice
 To: solr-user@lucene.apache.org
 Date: Monday, October 18, 2010, 12:43 PM
 Hi, 
 
 I am looking for a quick solution to improve a search
 engine's spell checking performance. I was wondering if
 anyone tried to integrate Google SpellCheck API with Solr
 search engine (if possible). Google spellcheck came to my
 mind because of two reasons. First, it is costly to clean up
 the data to be used as spell check baseline. Secondly,
 google probably has the most complete set of misspelled
 search terms. That's why I would like to know if it is a
 feasible way to go.
 
 Thanks,
 Xin
 This electronic mail message contains information that (a)
 is or 
 may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE 
 PROTECTED 
 BY LAW FROM DISCLOSURE, and (b) is intended only for the
 use of 
 the
 addressee(s) named herein.  If you are not an intended
 recipient, 
 please contact the sender immediately and take the steps 
 necessary 
 to delete the message completely from your computer
 system.
 
 Not Intended as a Substitute for a Writing: Notwithstanding
 the 
 Uniform Electronic Transaction Act or any other law of
 similar 
 effect, absent an express statement to the contrary, this
 e-mail 
 message, its contents, and any attachments hereto are not 
 intended 
 to represent an offer or acceptance to enter into a
 contract and 
 are not otherwise intended to bind this sender, 
 barnesandnoble.com 
 llc, barnesandnoble.com inc. or any other person or
 entity.

Re: ApacheCon Atlanta Meetup

2010-10-18 Thread Dennis Gearon

I would love to go, but funds are low right now. NEXT year, I'd have something 
to demo though :-)


Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Mon, 10/18/10, Grant Ingersoll gsing...@apache.org wrote:

 From: Grant Ingersoll gsing...@apache.org
 Subject: ApacheCon Atlanta Meetup
 To: solr-user@lucene.apache.org
 Date: Monday, October 18, 2010, 11:58 AM
 Is there interest in having a Meetup
 at ApacheCon?  Who's going?  Would anyone like to
 present?  We could do something less formal, too, and
 just have drinks and QA/networking.  Thoughts?
 
 -Grant

'Advertising' a site

2010-10-18 Thread Dennis Gearon

When I get my site which uses Solr/Lucene going, is is considered polite to 
post a small paragraph about it with a link?


Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.

Re: 'Advertising' a site

2010-10-18 Thread Otis Gospodnetic

Hi Dennis,

There is a PoweredBy page on the Wiki that's good for that.


Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Dennis Gearon gear...@sbcglobal.net
 To: solr-user@lucene.apache.org
 Sent: Mon, October 18, 2010 11:35:09 PM
 Subject: 'Advertising' a site
 
 When I get my site which uses Solr/Lucene going, is is considered polite to 
post  a small paragraph about it with a link?
 
 
 Dennis  Gearon
 
 Signature Warning
 
 It is always a good idea  to learn from your own mistakes. It is usually a 
better idea to learn from  others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
 EARTH  has a Right To Life,
   otherwise we all die.

Re: Schema required?

2010-10-18 Thread Otis Gospodnetic

Solr requires a schema.
But Lucene does not! :)

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Frank Calfo fca...@aravo.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Mon, October 18, 2010 5:25:27 PM
 Subject: Schema required?
 
 We need to index documents where the fields in the document can change  
frequently.
 
 It appears that we would need to update our Solr schema  definition before we 
can reindex using new fields.
 
 Is there any way to  make the Solr schema optional?
 
 
 
 --frank

Re: 'Advertising' a site

2010-10-18 Thread Dennis Gearon

Cool, thanks!

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Mon, 10/18/10, Otis Gospodnetic otis_gospodne...@yahoo.com wrote:

 From: Otis Gospodnetic otis_gospodne...@yahoo.com
 Subject: Re: 'Advertising' a site
 To: solr-user@lucene.apache.org
 Date: Monday, October 18, 2010, 9:28 PM
 Hi Dennis,
 
 There is a PoweredBy page on the Wiki that's good for
 that.
 
 
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/
 
 
 
 - Original Message 
  From: Dennis Gearon gear...@sbcglobal.net
  To: solr-user@lucene.apache.org
  Sent: Mon, October 18, 2010 11:35:09 PM
  Subject: 'Advertising' a site
  
  When I get my site which uses Solr/Lucene going, is is
 considered polite to 
 post  a small paragraph about it with a link?
  
  
  Dennis  Gearon
  
  Signature Warning
  
  It is always a good idea  to learn from your own
 mistakes. It is usually a 
 better idea to learn from  others’ mistakes, so
 you do not have to make them 
 yourself. from 
 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
  
  EARTH  has a Right To Life,
    otherwise we all die.

Re: Removing Common Web Page Header and Footer from All Content Fetched by Nutch

2010-10-18 Thread Otis Gospodnetic

Hi Israel,

You can use this: http://search-lucene.com/?q=boilerpipefc_project=Tika
Not sure if it's built into Nutch, though...

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Israel Ekpo israele...@gmail.com
 To: solr-user@lucene.apache.org; u...@nutch.apache.org
 Sent: Mon, October 18, 2010 9:01:50 PM
 Subject: Removing Common Web Page Header and Footer from All Content Fetched 
 by 
Nutch
 
 Hi All,
 
 I am indexing a web application with approximately 9500 distinct  URL and
 contents using Nutch and Solr.
 
 I use Nutch to fetch the urls,  links and the crawl the entire web
 application to extract all the content for  all pages.
 
 Then I run the solrindex command to send the content to  Solr.
 
 The problem that I have now is that the first 1000 or so characters  of some
 pages and the last 400 characters of the pages are showing up in the  search
 results.
 
 These are contents of the common header and footer  used in the site
 respectively.
 
 The only work around that I have now is  to index everything and then go
 through each document one at a time to remove  the first 1000 characters if
 the levenshtein distance between the first 1000  characters of the page and
 the common header is less than a certain value.  Same applies to the footer
 content common to all pages.
 
 Is there a way  to ignore certain stop phrase so to speak in the Nutch
 configuration based  on levenshtein distance or jaro winkler distance so that
 certain parts of the  fetched data that matches this stop phrases will not be
 parsed?
 
 Any  useful pointers would be highly appreciated.
 
 Thanks in  advance.
 
 
 -- 
 °O°
 Good Enough is not good enough.
 To give  anything less than your best is to sacrifice the gift.
 Quality First. Measure  Twice. Cut Once.
 http://www.israelekpo.com/

count(*) equivilent in Solr/Lucene

2010-10-18 Thread Dennis Gearon

Is there something in Solr/Lucene that could give me the equivalent to:

SELECT 
  COUNT(*) 
WHERE
  date_column1  :start_date AND
  date_column2  :end_date;

Providing I take into account deleted documents, of course (I.E., do some sort 
of averaging or some tracking function over time.)


Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.

Re: 'Advertising' a site

2010-10-18 Thread Chris Hostetter


: There is a PoweredBy page on the Wiki that's good for that.

Even better is a post to the list telling folks about your usee case, 
index size, hardware, etc  

A lot of new users find that information really helpful for comparison.


-Hoss

Re: count(*) equivilent in Solr/Lucene

2010-10-18 Thread Chris Hostetter

: 
: SELECT 
:   COUNT(*) 
: WHERE
:   date_column1  :start_date AND
:   date_column2  :end_date;

   q=*:*fq=column1:[start TO *]fq=column2:[end TO *]rows=0

...every result includes a total count.

-Hoss

77 matches

Mail list logo