Re: Adding new field after data is already indexed

2013-11-26 Thread jefferyyuan
Check  Solr: Add new fields with Default Value for Existing Documents
http://lifelongprogrammer.blogspot.com/2013/06/solr-use-doctransformer-to-change.html
  
If we only need search and display the new fields, we can do the following
steps.
  
  1. add the new field definition in schema.xml:
field name=newFiled type=tint indexed=true stored=true
default=-1/

  2. We need update search query: when search default value for this
newFiled, also search null value: 
-(-newFiled:defaultValue AND newFiled:[* TO *])
  3. Use DocTransformer to add default value when there is no value in that
field for old data.

Some functions may not work such as sort, stats.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-new-field-after-data-is-already-indexed-tp1862575p4103440.html
Sent from the Solr - User mailing list archive at Nabble.com.


The way edismax parses colon seems weird

2013-07-19 Thread jefferyyuan
In our application, user may search error code like 12:34.

We define default search field, like: str name=qftitle^10 body_stored^8
content^5/str
So when user search: 12:34, we want to search the error code in the
specified fields.

In the code, if we search q=12:34 directly, this can't find anything. It's
expected as it'ss to search 34 on 12 field.

Then we try to escape the colon, search: 12\:34, the parsedquery would be
+12\:34, still can't find the expected page.
str name=parsedquery(+12\:34)/no_coord/str
str name=parsedquery_toString+12\:34/str
str name=QParserExtendedDismaxQParser/str

If I type 2 \\, seems it can find the error page:
q=12\\:34
str name=parsedquery
(+DisjunctionMaxQuery((content:12 34^0.5 | body_stored:(12\:34 12)
34^0.8 | title:12 34^1.1)))/no_coord
/str
str name=parsedquery_toString
+(content:12 34^0.5 | body_stored:(12\:34 12) 34^0.8 | title:12
34^1.1)
/str
str name=QParserExtendedDismaxQParser/str

Is this a bug in Solr edismax or not?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-way-edismax-parses-colon-seems-weird-tp4079226.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: The way edismax parses colon seems weird

2013-07-19 Thread jefferyyuan
Thanks very much for the reply. 
We are querying solr directly from browser:
http://localhost:8080/solr/select?q=12\:34defType=edismaxdebug=queryqf=content

str name=rawquerystring12\:34/str
str name=querystring12\:34/str
str name=parsedquery(+12\:34)/no_coord/str
str name=parsedquery_toString+12\:34/str
str name=QParserExtendedDismaxQParser/str

And seems this is not related with which (default) field I use to query.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-way-edismax-parses-colon-seems-weird-tp4079226p4079234.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Difference between IntField and TrieIntField in Lucene 4.0

2013-01-12 Thread jefferyyuan
Thanks very much, Yonik.
I should read the Javadoc of Solr's IntField and TrieIntField.

In Javadoc of Solr's IntField, IntField is marked as legacy field type:
A legacy numeric field type that encodes Integer values as simple Strings.
This class should not be used except by people with existing indexes that
contain numeric values indexed as Strings. New schemas should use
TrieIntField. 
Field values will sort numerically, but Range Queries (and other features
that rely on numeric ranges) will not work as expected: values will be
evaluated in unicode String order, not numeric order. 

I remembered I read this a few weeks ago, today when discussed with
coworker, and we looked at the javadoc. It is not what I expected.

Thanks again for your prompt reply :)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Difference-between-IntField-and-TrieIntField-in-Lucene-4-0-tp4032938p4032953.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr stats.facet on TrieField doesn't work

2012-12-19 Thread jefferyyuan
This seems an known issue: 
http://wiki.apache.org/solr/StatsComponent
TrieFields has to use a precisionStep of -1 to avoid using
UnInvertedField.java. Consider using one field for doing stats, and one for
doing range facetting on.

To fix this problem. and support dacet search on this field, I have to
create another field,  with precisionStep=2147483647(Integer,MAX_VALUE),
this is not good, as it takes more disk size, and it's hard to explain to
customers why we need this field.

Seem this problem is already reported and tracked by
https://issues.apache.org/jira/browse/SOLR-2976, but there is no update
since 03/Jan/12.

Does Solr team have any plan to fix this problem?

The follwing is my test result:
I have 2 fields, one field is effectiveSize_tl, type:TrieLongField,
precisionStep=8, default setting.
One field is ctime_tdt: type: TrieDateField, precisionStep=6, default
setting.

I also create 2 another fields:
effectiveSize_tlMinus, same as effectiveSize_tl, except
precisionStep=2147483647.
ctime_tdtMinus, same as ctime_tdt, except precisionStep=2147483647.

http://localhost:5678/solr/select?q=*:*rows=0stats=truestats.field=effectiveSize_tlstats.facet=ctime_tdt
str name=msgInvalid Date String:'\#8;'/str 

http://localhost:5678/solr/select?q=*:*rows=0stats=truestats.field=effectiveSize_tl
works
http://localhost:5678/solr/select?q=*:*rows=0stats=truestats.field=ctime_tdt
works

This works correctly: - using both precisionStep=2147483647 fields:
http://localhost:5678/solr/select?q=*:*rows=0stats=truestats.field=effectiveSize_tlMinusstats.facet=ctime_tdtMinus


This doesn't throw error, but the result is totoally not correct.
http://localhost:5678/solr/select?q=*:*rows=0stats=truestats.field=effectiveSize_tlMinusstats.facet=ctime_tdt

http://localhost:5678/solr/select?q=*:*rows=0stats=truestats.field=effectiveSize_tlstats.facet=ctime_tdt
throw exception: 
str name=msgInvalid Date String:'\#8;'/str 

http://localhost:5678/solr/select?q=*:*rows=0stats=truestats.field=effectiveSize_tlstats.facet=ctime_tdtMinus
still throw exception
str name=msgInvalid Date String:' #1;#0;#0;#0; #8;t#1;#20;#0;'/str




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-stats-facet-on-TrieField-doesn-t-work-tp4028175.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Is there a way to round data when index, but still able to return original content?

2012-12-10 Thread jefferyyuan
Sorry to ask a question again, but I want to round date(TireDate) and
TrieLongField, seems they don't support configuring analyzer: charFilter ,
tokenizer or filter.

What I should do? Now I am thinking to write my custom date or long field,
is there any other way? :)

Thanks :)
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-round-data-when-index-but-still-able-to-return-original-content-tp4025405p4025793.html
Sent from the Solr - User mailing list archive at Nabble.com.


Monitor Deleted Event

2012-10-24 Thread jefferyyuan
When some docs are deleted from Solr server, I want to execute some code -
for example, add an record such as {contentid, deletedat} to another solr
server or database.

How can I do this through Solr or Lucene?

Thanks for any reply and help :)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Monitor-Deleted-Event-tp4015624.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to import a part of index from main Solr server(based on a query) to another Solr server and then do incremental import at intervals later(the updated index)?

2012-10-24 Thread jefferyyuan
Hi, all:

Sorry for the late response: )
Thanks for your reply.

I think Solr Replication may not help in my case, as the central server
would store all docs of all users(1000+), and in each client, I only want to
copy index of his/her docs created or changed in last 2 weeks(for example),
after the first import, make a delta-import each day to get the changed or
deleted index from remote central server.

In my current implementation, I use DataImportHandler and
SOlrEntityProcessor, in short:

I write a new request handler: ImportLocalCacheHandler, url: /importcache
for first import, I call
/importcachequery?command=full-importfrom:jefferyfirst_index_time={first_index_time}
In my ImportLocalCacheHandler, I will build a query, such as
query=from:jefferylast_modified:{first_index_time TO NOW}, and then call
/dataimport?command=full-importquery:{previous_query}.
After it succeeds, save last_index_time to a property file.
 
for delta-import, I call /importcachequery?command=delta-import
In my ImportLocalCacheHandler, I will build a query like
from:jefferylast_modified:{last_index_time TO NOW}
and call /dataimport?command=full-importclean=falsequery={previous_query}.

This will import index of docs created or changed between last_index_time TO
NOW.

But Now I am trying to figure out how to remove the index from local cache
server that are alredy deleted in remote server but still exist in local
cache.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-import-a-part-of-index-from-main-Solr-server-based-on-a-query-to-another-Solr-server-and-then-tp4013479p4015633.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to import a part of index from main Solr server(based on a query) to another Solr server and then do incremental import at intervals later(the updated index)?

2012-10-12 Thread jefferyyuan
I have a main solr server(solr1) which stores indexes of all docs, and want
to implement the following function:
1. First make a full import of my doc updated/created recently(last 1 or 2
weeks) from solr1.
2. Make delta import at intervals to copy the change of my doc from solr1 to
solr2. - doc may be deleted, updated, created during this period.

-- as the function supported by SqlEntityProcessor to import data from DB to
Solr.

http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor
SolrEntityProcessor can make a full-import from one Solr to another solr
based on a query(using query parameter in config file), but seems can't do
delta import later: no deltaImportQuery and deltaQuery configuration, which
is supported in SqlEntityProcessor.

I have a field last_modified which records the timestamp an doc is created
or updated. 
Task1 can be easily implemented: entity name=sep
processor=SolrEntityProcessor query=+from:jeffery
+last_modified:[${dataimporter.request.start_time} TO NOW]
url=mainsolr:8080/solr//; 

But how can implement incremental import with SolrEntityProcessor? Seems
SolrEntityProcessor doesn't support command=delta-import. 

Thanks for any reply and help :)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-import-a-part-of-index-from-main-Solr-server-based-on-a-query-to-another-Solr-server-and-then-tp4013479.html
Sent from the Solr - User mailing list archive at Nabble.com.